xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Support for running secondary emulators
@ 2014-03-04 11:40 Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah2'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.


The modifications are broken down as follows:

Patch #1 basically just moves some code around to make subsequent patches
more obvious. The patch also removes the has_dm flag in hvememul_do_io() as
it is no longer necessary to special-case PVH domains in this way. (The I/O
can be completed by hvm_send_assist_req() later, when it is discovered there
is no shared ioreq page).

Patch #2 tidies up some uses of ioreq_t as suggested by Andrew Cooper.

Patch #3 again is largely code movement, from various places into a new
hvm_ioreq_server structure. There should be no functional change at this
stage as the ioreq server is still created at domain initialisation time (as
were its contents prior to this patch).

Patch #4 is the first functional change. The ioreq server struct
initialisation is now deferred until something actually tries to play with
the HVM parameters which reference it. In practice this is QEMU, which
needs to read the ioreq pfns so it can map them.

Patch #5 is the big one. This moves from a single ioreq server per domain
to a list. The server that is created when the HVM parameters are reference
is given id 0 and is considered to be the 'catch all' server which is, after
all, how QEMU is used. Any secondary emulator, created using the new API
in xenctrl.h, will have id 1 or above and only gets ioreqs when I/O hits one
of its registered IO ranges or PCI devices.

Patch #6 pulls the PCI hotplug controller emulation into Xen. This is
necessary to allow a secondary emulator to hotplug a PCI device into the VM.
The code implements the controller in the same way as upstream QEMU and thus
the variant of the DSDT ASL used for upstream QEMU is retained.


There are no modifications to libxl to actually invoke a secondary emulator
at this stage. The only changes made are simply to increase the number of
special pages reserved for a VM to allow the use of more than one emulator
and call the new PCI hotplug API when attaching or detaching PCI devices.
The demo emulator can simply be invoked from a shell and will hotplug its
device onto the PCI bus (and remove it again when it's killed). The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

v2:
 - First non-RFC posting

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  2014-03-04 12:21   ` Jan Beulich
  2014-03-04 11:40 ` [PATCH v2 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

To simplify creation of the ioreq server abstraction in a
subsequent patch, this patch centralizes all use of the shared
ioreq structure and the buffered ioreq ring to the source module
xen/arch/x86/hvm/hvm.c.

Also, re-work hvm_send_assist_req() slightly to complete IO
immediately in the case where there is no emulator (i.e. the shared
IOREQ ring has not been set). This should handle the case currently
covered by has_dm in hvmemul_do_io().

This patch also adds some missing emacs boilerplate in the places where I
needed it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |   50 ++++++-----------
 xen/arch/x86/hvm/hvm.c            |  108 ++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/io.c             |   94 +-------------------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |   13 ++++-
 xen/include/asm-x86/hvm/hvm.h     |   14 ++++-
 xen/include/asm-x86/hvm/support.h |   19 +++----
 6 files changed, 160 insertions(+), 138 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..154d14e 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p[1];
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,15 +160,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state =
         (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -193,6 +171,7 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
+    p->state = STATE_IOREQ_NONE;
     p->dir = dir;
     p->data_is_ptr = value_is_ptr;
     p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
@@ -232,20 +211,15 @@ static int hvmemul_do_io(
             vio->io_state = HVMIO_handle_mmio_awaiting_completion;
         break;
     case X86EMUL_UNHANDLEABLE:
-        /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        rc = X86EMUL_RETRY;
+        if ( !hvm_send_assist_req(curr, p) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
         }
-        else
-        {
-            rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
-                vio->io_state = HVMIO_none;
-            else if ( p_data == NULL )
-                rc = X86EMUL_OKAY;
-        }
+        else if ( p_data == NULL )
+            rc = X86EMUL_OKAY;
+
         break;
     default:
         BUG();
@@ -1292,3 +1266,13 @@ struct segment_register *hvmemul_get_seg_reg(
         hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
     return &hvmemul_ctxt->seg_reg[seg];
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9e85c13..b8bf225 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,6 +345,24 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
+bool_t hvm_io_pending(struct vcpu *v)
+{
+    ioreq_t *p;
+
+    if ( !(p = get_ioreq(v)) )
+         return 0;
+
+    return ( p->state != STATE_IOREQ_NONE );
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p;
@@ -1407,7 +1425,86 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+int hvm_buffered_io_send(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir  = p->dir;
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+    
+    bp.data = p->data;
+    bp.addr = p->addr;
+    
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+    
+    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
+           &bp, sizeof(bp));
+    
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
+               &bp, sizeof(bp));
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(v->domain,
+            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+    
+    return 1;
+}
+
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
 {
     ioreq_t *p;
 
@@ -1425,6 +1522,15 @@ bool_t hvm_send_assist_req(struct vcpu *v)
         return 0;
     }
 
+    p->dir = proto_p->dir;
+    p->data_is_ptr = proto_p->data_is_ptr;
+    p->type = proto_p->type;
+    p->size = proto_p->size;
+    p->addr = proto_p->addr;
+    p->count = proto_p->count;
+    p->df = proto_p->df;
+    p->data = proto_p->data;
+
     prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
 
     /*
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..576641c 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,85 +46,6 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
-{
-    struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    bp.type = p->type;
-    bp.dir  = p->dir;
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
     ioreq_t p[1];
@@ -150,25 +71,14 @@ void send_timeoffset_req(unsigned long timeoff)
 void send_invalidate_req(void)
 {
     struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
+    ioreq_t p[1];
 
     p->type = IOREQ_TYPE_INVALIDATE;
     p->size = 4;
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req(v, p);
 }
 
 int handle_mmio(void)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 40167d6..0421623 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-    const ioreq_t *ioreq = get_ioreq(v);
 
     /*
      * A pending IO emulation may still be not finished. In this case, no
@@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
@@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     /* nvcpu.guest_cr is what L2 write to cr actually. */
     __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dcc3483..40aeddf 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -227,7 +228,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
@@ -339,6 +340,7 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx);
 void hvm_migrate_timers(struct vcpu *v);
+bool_t hvm_io_pending(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
@@ -522,3 +524,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..05ef5c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,19 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
@@ -142,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] ioreq-server: tidy up use of ioreq_t
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch tidies up various occurences of single element ioreq_t
arrays on the stack and improves coding style.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c |   38 +++++++++++++++++++-------------------
 xen/arch/x86/hvm/hvm.c     |    2 ++
 xen/arch/x86/hvm/io.c      |   37 +++++++++++++++++--------------------
 3 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 154d14e..73808f3 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,7 +57,7 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t p[1];
+    ioreq_t p;
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
@@ -171,39 +171,39 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
-    p->state = STATE_IOREQ_NONE;
-    p->dir = dir;
-    p->data_is_ptr = value_is_ptr;
-    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
-    p->size = size;
-    p->addr = addr;
-    p->count = *reps;
-    p->df = df;
-    p->data = value;
+    p.state = STATE_IOREQ_NONE;
+    p.dir = dir;
+    p.data_is_ptr = value_is_ptr;
+    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
+    p.size = size;
+    p.addr = addr;
+    p.count = *reps;
+    p.df = df;
+    p.data = value;
 
     if ( dir == IOREQ_WRITE )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( is_mmio )
     {
-        rc = hvm_mmio_intercept(p);
+        rc = hvm_mmio_intercept(&p);
         if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(p);
+            rc = hvm_buffered_io_intercept(&p);
     }
     else
     {
-        rc = hvm_portio_intercept(p);
+        rc = hvm_portio_intercept(&p);
     }
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
     case X86EMUL_RETRY:
-        *reps = p->count;
-        p->state = STATE_IORESP_READY;
+        *reps = p.count;
+        p.state = STATE_IORESP_READY;
         if ( !vio->mmio_retry )
         {
-            hvm_io_assist(p);
+            hvm_io_assist(&p);
             vio->io_state = HVMIO_none;
         }
         else
@@ -212,7 +212,7 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
         rc = X86EMUL_RETRY;
-        if ( !hvm_send_assist_req(curr, p) )
+        if ( !hvm_send_assist_req(curr, &p) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
@@ -234,7 +234,7 @@ static int hvmemul_do_io(
 
  finish_access:
     if ( dir == IOREQ_READ )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( p_data != NULL )
         memcpy(p_data, &vio->io_data, size);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b8bf225..e07cae3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -349,7 +349,9 @@ static ioreq_t *get_ioreq(struct vcpu *v)
 {
     struct domain *d = v->domain;
     shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
     ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
     return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
 }
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 576641c..c9adb94 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -48,22 +48,19 @@
 
 void send_timeoffset_req(unsigned long timeoff)
 {
-    ioreq_t p[1];
+    ioreq_t p = {
+        .type = IOREQ_TYPE_TIMEOFFSET,
+        .size = 8,
+        .count = 1,
+        .dir = IOREQ_WRITE,
+        .data = timeoff,
+        .state = STATE_IOREQ_READY,
+    };
 
     if ( timeoff == 0 )
         return;
 
-    memset(p, 0, sizeof(*p));
-
-    p->type = IOREQ_TYPE_TIMEOFFSET;
-    p->size = 8;
-    p->count = 1;
-    p->dir = IOREQ_WRITE;
-    p->data = timeoff;
-
-    p->state = STATE_IOREQ_READY;
-
-    if ( !hvm_buffered_io_send(p) )
+    if ( !hvm_buffered_io_send(&p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
@@ -71,14 +68,14 @@ void send_timeoffset_req(unsigned long timeoff)
 void send_invalidate_req(void)
 {
     struct vcpu *v = current;
-    ioreq_t p[1];
-
-    p->type = IOREQ_TYPE_INVALIDATE;
-    p->size = 4;
-    p->dir = IOREQ_WRITE;
-    p->data = ~0UL; /* flush all */
-
-    (void)hvm_send_assist_req(v, p);
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
+
+    (void)hvm_send_assist_req(v, &p);
 }
 
 int handle_mmio(void)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction.
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  2014-03-04 12:50   ` Jan Beulich
  2014-03-04 11:40 ` [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions.

This patch also adds some more missing emacs boilerplate in the places
where I needed it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/hvm.c           |  322 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |   18 ++-
 xen/include/asm-x86/hvm/vcpu.h   |   12 +-
 3 files changed, 248 insertions(+), 104 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e07cae3..d9586b2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,28 +345,32 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    shared_iopage_t *p = s->ioreq.va;
 
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    ASSERT(p != NULL);
 
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    return &p->vcpu_ioreq[id];
 }
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
-    if ( !(p = get_ioreq(v)) )
-         return 0;
+    if ( !s )
+        return 0;
 
+    p = get_ioreq(s, v->vcpu_id);
     return ( p->state != STATE_IOREQ_NONE );
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
     check_wakeup_from_wait();
@@ -374,10 +378,11 @@ void hvm_do_resume(struct vcpu *v)
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
+    if ( !s )
         goto check_inject_trap;
 
+    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
+    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -387,7 +392,7 @@ void hvm_do_resume(struct vcpu *v)
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+            wait_on_xen_event_channel(p->vp_eport,
                                       (p->state != STATE_IOREQ_READY) &&
                                       (p->state != STATE_IOREQ_INPROCESS));
             break;
@@ -410,7 +415,6 @@ void hvm_do_resume(struct vcpu *v)
 static void hvm_init_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
-    memset(iorp, 0, sizeof(*iorp));
     spin_lock_init(&iorp->lock);
     domain_pause(d);
 }
@@ -553,6 +557,167 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static int hvm_init_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    int i;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+
+    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
+        s->ioreq_evtchn[i] = -1;
+    s->buf_ioreq_evtchn = -1;
+
+    hvm_init_ioreq_page(d, &s->ioreq);
+    hvm_init_ioreq_page(d, &s->buf_ioreq);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_deinit_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_destroy_ioreq_page(d, &s->ioreq);
+    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
+
+    xfree(s);
+}
+
+static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    if ( s->ioreq.va != NULL )
+    {
+        shared_iopage_t *p = s->ioreq.va;
+        struct vcpu *v;
+
+        for_each_vcpu ( d, v )
+            p->vcpu_ioreq[v->vcpu_id].vp_eport = s->ioreq_evtchn[v->vcpu_id];
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    int rc;
+
+    /* Create ioreq event channel. */
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto done;
+
+    /* Register ioreq event channel. */
+    s->ioreq_evtchn[v->vcpu_id] = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        /* Create bufioreq event channel. */
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto done;
+
+        s->buf_ioreq_evtchn = rc;
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+    rc = 0;
+
+done:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    if ( v->vcpu_id == 0 )
+    {
+        if ( s->buf_ioreq_evtchn >= 0 )
+        {
+            free_xen_event_channel(v, s->buf_ioreq_evtchn);
+            s->buf_ioreq_evtchn = -1;
+        }
+    }
+
+    if ( s->ioreq_evtchn[v->vcpu_id] >= 0 )
+    {
+        free_xen_event_channel(v, s->ioreq_evtchn[v->vcpu_id]);
+        s->ioreq_evtchn[v->vcpu_id] = -1;
+    }
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     int *p_port)
+{
+    int old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+{
+    struct domain *d = s->domain;
+    struct vcpu *v;
+    int rc = 0;
+
+    domain_pause(d);
+
+    if ( d->vcpu[0] )
+    {
+        rc = hvm_replace_event_channel(d->vcpu[0], domid, &s->buf_ioreq_evtchn);
+        if ( rc < 0 )
+            goto done;
+    }
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_replace_event_channel(v, domid, &s->ioreq_evtchn[v->vcpu_id]);
+        if ( rc < 0 )
+            goto done;
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+
+    s->domid = domid;
+
+done:
+    domain_unpause(d);
+
+    return rc;
+}
+
+static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    struct domain *d = s->domain;
+    int rc;
+
+    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
+    if ( rc < 0 )
+        return rc;
+
+    hvm_update_ioreq_server_evtchn(s);
+
+    return 0;
+}
+
+static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    struct domain *d = s->domain;
+
+    return hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -620,17 +785,20 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    rc = hvm_init_ioreq_server(d);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_deinit_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -654,8 +822,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_deinit_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1287,7 +1454,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1330,30 +1497,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
+    rc = hvm_ioreq_server_add_vcpu(s, v);
     if ( rc < 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1387,6 +1534,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1398,9 +1550,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1430,8 +1579,10 @@ void hvm_vcpu_down(struct vcpu *v)
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
@@ -1439,6 +1590,12 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    if ( !s )
+        return 0;
+
+    iorp = &s->buf_ioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1499,8 +1656,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d, s->buf_ioreq_evtchn);
     spin_unlock(&iorp->lock);
     
     return 1;
@@ -1508,19 +1664,23 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v->vcpu_id);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
@@ -1533,14 +1693,14 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
     p->df = proto_p->df;
     p->data = proto_p->data;
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(d, p->vp_eport);
 
     return 1;
 }
@@ -4133,21 +4293,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4160,7 +4305,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
         struct domain *d;
         struct vcpu *v;
 
@@ -4193,19 +4337,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_server_pfn(d->arch.hvm_domain.ioreq_server,
+                                              a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN: 
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_server_buf_pfn(d->arch.hvm_domain.ioreq_server,
+                                                  a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4260,31 +4397,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_ioreq_server_domid(d->arch.hvm_domain.ioreq_server,
+                                                a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4379,6 +4493,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         {
             switch ( a.index )
             {
+            case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+                a.value = s->buf_ioreq_evtchn;
+                break;
+            }
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b1e3187..a77b83d 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,10 +41,17 @@ struct hvm_ioreq_page {
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_server {
+    struct domain          *domain;
+    domid_t                domid;
     struct hvm_ioreq_page  ioreq;
+    int                    ioreq_evtchn[MAX_HVM_VCPUS];
     struct hvm_ioreq_page  buf_ioreq;
+    int                    buf_ioreq_evtchn;
+};
 
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
@@ -100,3 +107,12 @@ struct hvm_domain {
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
 
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 122ab0d..08e98fb 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,8 +138,6 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
-
     bool_t              flag_dr_dirty;
     bool_t              debug_state_latch;
     bool_t              single_step;
@@ -186,3 +184,13 @@ struct hvm_vcpu {
 };
 
 #endif /* __ASM_X86_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-03-04 11:40 ` [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  2014-03-04 13:02   ` Jan Beulich
  2014-03-04 11:40 ` [PATCH v2 5/6] ioreq-server: add support for multiple servers Paul Durrant
  2014-03-04 11:40 ` [PATCH v2 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  5 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch only creates the ioreq server when the legacy HVM parameters
are touched by an emulator. It also lays some groundwork for supporting
multiple IOREQ servers. For instance, it introduces ioreq server reference
counting which is not strictly necessary at this stage but will become so
when ioreq servers can be destroyed prior the domain dying.

There is a significant change in the layout of the special pages reserved
in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
moving pages such as the xenstore page when building a domain that can
support more than one emulator.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/libxc/xc_hvm_build_x86.c |   40 +++--
 xen/arch/x86/hvm/hvm.c         |  355 ++++++++++++++++++++++++++--------------
 2 files changed, 258 insertions(+), 137 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index dd3b522..b65e702 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -41,13 +41,12 @@
 #define SPECIALPAGE_PAGING   0
 #define SPECIALPAGE_ACCESS   1
 #define SPECIALPAGE_SHARING  2
-#define SPECIALPAGE_BUFIOREQ 3
-#define SPECIALPAGE_XENSTORE 4
-#define SPECIALPAGE_IOREQ    5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
-#define NR_SPECIAL_PAGES     8
-#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
+#define SPECIALPAGE_XENSTORE 3
+#define SPECIALPAGE_IDENT_PT 4
+#define SPECIALPAGE_CONSOLE  5
+#define SPECIALPAGE_IOREQ    6
+#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define special_pfn(x) (0xff000u - 1 - (x))
 
 #define VGA_HOLE_SIZE (0x20)
 
@@ -114,7 +113,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -473,6 +472,23 @@ static int setup_guest(xc_interface *xch,
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
+
+    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
+    DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
+    DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS));
+    DPRINTF("  SHARING:   %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING));
+    DPRINTF("  STORE:     %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE));
+    DPRINTF("  IDENT_PT:  %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
+    DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
+    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
+
     for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
@@ -488,10 +504,6 @@ static int setup_guest(xc_interface *xch,
 
     xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_BUFIOREQ));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
                      special_pfn(SPECIALPAGE_CONSOLE));
     xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
@@ -500,6 +512,10 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_ACCESS));
     xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ) - 1);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d9586b2..fb2dd73 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -367,22 +367,9 @@ bool_t hvm_io_pending(struct vcpu *v)
     return ( p->state != STATE_IOREQ_NONE );
 }
 
-void hvm_do_resume(struct vcpu *v)
+static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 {
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    check_wakeup_from_wait();
-
-    if ( is_hvm_vcpu(v) )
-        pt_restore_timer(v);
-
-    if ( !s )
-        goto check_inject_trap;
-
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -398,12 +385,29 @@ void hvm_do_resume(struct vcpu *v)
             break;
         default:
             gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
+            domain_crash(d);
             return; /* bail */
         }
     }
+}
+
+void hvm_do_resume(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    check_wakeup_from_wait();
+
+    if ( is_hvm_vcpu(v) )
+        pt_restore_timer(v);
+
+    if ( s )
+    {
+        ioreq_t *p = get_ioreq(s, v->vcpu_id);
+
+        hvm_wait_on_io(d, p);
+    }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -412,11 +416,13 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
 {
+    struct hvm_ioreq_page *iorp;
+
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
+
     spin_lock_init(&iorp->lock);
-    domain_pause(d);
 }
 
 void destroy_ring_for_helper(
@@ -432,16 +438,13 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, int buf)
 {
-    spin_lock(&iorp->lock);
+    struct hvm_ioreq_page *iorp;
 
-    ASSERT(d->is_dying);
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
 
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -489,8 +492,10 @@ int prepare_ring_for_helper(
 }
 
 static int hvm_set_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
+    struct hvm_ioreq_server *s, int buf, unsigned long gmfn)
 {
+    struct domain *d = s->domain;
+    struct hvm_ioreq_page *iorp;
     struct page_info *page;
     void *va;
     int rc;
@@ -498,22 +503,17 @@ static int hvm_set_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
 
     if ( (iorp->va != NULL) || d->is_dying )
     {
-        destroy_ring_for_helper(&iorp->va, iorp->page);
-        spin_unlock(&iorp->lock);
+        destroy_ring_for_helper(&va, page);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -557,38 +557,6 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
-static int hvm_init_ioreq_server(struct domain *d)
-{
-    struct hvm_ioreq_server *s;
-    int i;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    s->domain = d;
-
-    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
-        s->ioreq_evtchn[i] = -1;
-    s->buf_ioreq_evtchn = -1;
-
-    hvm_init_ioreq_page(d, &s->ioreq);
-    hvm_init_ioreq_page(d, &s->buf_ioreq);
-
-    d->arch.hvm_domain.ioreq_server = s;
-    return 0;
-}
-
-static void hvm_deinit_ioreq_server(struct domain *d)
-{
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-
-    hvm_destroy_ioreq_page(d, &s->ioreq);
-    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
-
-    xfree(s);
-}
-
 static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
 {
     struct domain *d = s->domain;
@@ -650,6 +618,123 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     }
 }
 
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+    unsigned long pfn;
+    struct vcpu *v;
+    int i, rc;
+
+    rc = -EEXIST;
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        goto fail_exist;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+
+    rc = -ENOMEM;
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        goto fail_alloc;
+
+    s->domain = d;
+    s->domid = domid;
+
+    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
+        s->ioreq_evtchn[i] = -1;
+    s->buf_ioreq_evtchn = -1;
+
+    /* Initialize shared pages */
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 0);
+    if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
+        goto fail_set_ioreq;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 1);
+    if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
+        goto fail_set_buf_ioreq;
+
+    domain_pause(d);
+
+    for_each_vcpu ( d, v )
+    {
+        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
+            goto fail_add_vcpu;
+    }
+
+    d->arch.hvm_domain.ioreq_server = s;
+
+    domain_unpause(d);
+
+    return 0;
+
+fail_add_vcpu:
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+    domain_unpause(d);
+    hvm_destroy_ioreq_page(s, 1);
+fail_set_buf_ioreq:
+    hvm_destroy_ioreq_page(s, 0);
+fail_set_ioreq:
+    xfree(s);
+fail_alloc:
+fail_exist:
+    return rc;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    struct vcpu *v;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        return;
+
+    domain_pause(d);
+
+    d->arch.hvm_domain.ioreq_server = NULL;
+
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    domain_unpause(d);
+
+    hvm_destroy_ioreq_page(s, 1);
+    hvm_destroy_ioreq_page(s, 0);
+
+    xfree(s);
+}
+
+static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    if ( !s )
+        return -ENOENT;
+
+    *port = s->buf_ioreq_evtchn;
+    return 0;
+}
+
+static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int i;
+
+    if ( !s )
+        return -ENOENT;
+
+    i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
+    *pfn = d->arch.hvm_domain.params[i];
+
+    return 0;
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      int *p_port)
 {
@@ -665,14 +750,22 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
 {
-    struct domain *d = s->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     struct vcpu *v;
     int rc = 0;
 
     domain_pause(d);
 
+    rc = -ENOENT;
+    if ( !s )
+        goto done;
+
+    rc = 0;
+    if ( s->domid == domid )
+        goto done;
+
     if ( d->vcpu[0] )
     {
         rc = hvm_replace_event_channel(d->vcpu[0], domid, &s->buf_ioreq_evtchn);
@@ -697,27 +790,6 @@ done:
     return rc;
 }
 
-static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    struct domain *d = s->domain;
-    int rc;
-
-    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
-    if ( rc < 0 )
-        return rc;
-
-    hvm_update_ioreq_server_evtchn(s);
-
-    return 0;
-}
-
-static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    struct domain *d = s->domain;
-
-    return hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
-}
-
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -785,20 +857,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    rc = hvm_init_ioreq_server(d);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_deinit_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -822,7 +888,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_deinit_ioreq_server(d);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1497,9 +1563,12 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
-    if ( rc < 0 )
-        goto fail6;
+    if ( s )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc < 0 )
+            goto fail6;
+    }
 
     if ( v->vcpu_id == 0 )
     {
@@ -1537,7 +1606,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1662,19 +1732,12 @@ int hvm_buffered_io_send(ioreq_t *p)
     return 1;
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
+static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
+                                            struct vcpu *v,
+                                            ioreq_t *proto_p)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
-        return 0; /* implicitly bins the i/o operation */
-
-    if ( !s )
-        return 0;
-
-    p = get_ioreq(s, v->vcpu_id);
+    ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
@@ -1705,6 +1768,20 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
     return 1;
 }
 
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
+        return 0;
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_server(s, v, p);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4336,14 +4413,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
             switch ( a.index )
             {
-            case HVM_PARAM_IOREQ_PFN:
-                rc = hvm_set_ioreq_server_pfn(d->arch.hvm_domain.ioreq_server,
-                                              a.value);
-                break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
-                rc = hvm_set_ioreq_server_buf_pfn(d->arch.hvm_domain.ioreq_server,
-                                                  a.value);
-                break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
                 hvm_latch_shinfo_size(d);
@@ -4397,8 +4466,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_set_ioreq_server_domid(d->arch.hvm_domain.ioreq_server,
-                                                a.value);
+                rc = hvm_create_ioreq_server(d, a.value);
+                if ( rc == -EEXIST )
+                    rc = hvm_set_ioreq_server_domid(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4493,12 +4563,47 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         {
             switch ( a.index )
             {
-            case HVM_PARAM_BUFIOREQ_EVTCHN: {
-                struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_EVTCHN:
+                /* May need to create server */
+                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+
+                switch ( a.index )
+                {
+                case HVM_PARAM_IOREQ_PFN: {
+                    xen_pfn_t pfn;
+
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
+                        goto param_fail;
+
+                    a.value = pfn;
+                    break;
+                }
+                case HVM_PARAM_BUFIOREQ_PFN: {
+                    xen_pfn_t pfn;
+
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
+                        goto param_fail;
+
+                    a.value = pfn;
+                    break;
+                }
+                case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                    evtchn_port_t port;
 
-                a.value = s->buf_ioreq_evtchn;
+                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
+                        goto param_fail;
+
+                    a.value = port;
+                    break;
+                }
+                default:
+                    BUG();
+                }
                 break;
-            }
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-03-04 11:40 ` [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  2014-03-04 12:06   ` Andrew Cooper
  2014-03-10 18:41   ` George Dunlap
  2014-03-04 11:40 ` [PATCH v2 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  5 siblings, 2 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

The legacy 'catch-all' server is always created with id 0. Secondary
servers will have an id ranging from 1 to a limit set by the toolstack
via the 'max_emulators' build info field. This defaults to 1 so ordinarily
no extra special pages are reserved for secondary emulators. It may be
increased using the secondary_device_emulators parameter in xl.cfg(5).
There's no clear limit to apply to the number of emulators so I've not
applied one.

Because of the re-arrangement of the special pages in a previous patch we
only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
the layout of the shared pages for multiple emulators. Guests migrated in
from hosts without this patch will be lacking the save record which stores
the new parameter and so the guest is assumed to only have had a single
emulator.

Added some more emacs boilerplate to xenctrl.h and xenguest.h

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 docs/man/xl.cfg.pod.5            |    7 +
 tools/libxc/xc_domain.c          |  175 +++++++
 tools/libxc/xc_domain_restore.c  |   20 +
 tools/libxc/xc_domain_save.c     |   12 +
 tools/libxc/xc_hvm_build_x86.c   |   24 +-
 tools/libxc/xenctrl.h            |   51 ++
 tools/libxc/xenguest.h           |   12 +
 tools/libxc/xg_save_restore.h    |    1 +
 tools/libxl/libxl.h              |    8 +
 tools/libxl/libxl_create.c       |    3 +
 tools/libxl/libxl_dom.c          |    1 +
 tools/libxl/libxl_types.idl      |    1 +
 tools/libxl/xl_cmdimpl.c         |    3 +
 xen/arch/x86/hvm/hvm.c           |  951 +++++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/include/asm-x86/hvm/domain.h |   23 +-
 xen/include/asm-x86/hvm/hvm.h    |    1 +
 xen/include/public/hvm/hvm_op.h  |   70 +++
 xen/include/public/hvm/ioreq.h   |    1 +
 xen/include/public/hvm/params.h  |    4 +-
 20 files changed, 1300 insertions(+), 70 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index e15a49f..0226c55 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
 This parameter only takes effect when device_model_version=qemu-xen.
 See F<docs/misc/pci-device-reservations.txt> for more information.
 
+=item B<secondary_device_emulators=NUMBER>
+
+If a number of secondary device emulators (i.e. in addition to
+qemu-xen or qemu-xen-traditional) are to be invoked to support the
+guest then this parameter can be set with the count of how many are
+to be used. The default value is zero.
+
 =back
 
 =head2 Device-Model Options
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..dfa905b 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    rc = do_xen_hypercall(xch, &hypercall);
+    *id = arg->id;
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *pfn,
+                                 xen_pfn_t *buf_pfn,
+                                 evtchn_port_t *buf_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( pfn )
+        *pfn = arg->pfn;
+
+    if ( buf_pfn )
+        *buf_pfn = arg->buf_pfn;
+
+    if ( buf_port )
+        *buf_port = arg->buf_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    arg->end = end;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 1f6ce50..3116653 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -746,6 +746,7 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t nr_ioreq_servers;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
+        /* Skip padding 4 bytes then read the acpi ioport location. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
+        {
+            PERROR("error reading the number of IOREQ servers");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    if ( hvm ) {
+        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
+
+        if ( nr_ioreq_servers == 0 )
+            nr_ioreq_servers = 1;
+
+        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
+    }
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 42c4752..3293e29 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the number of IOREQ servers");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index b65e702..6d6328a 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -45,7 +45,7 @@
 #define SPECIALPAGE_IDENT_PT 4
 #define SPECIALPAGE_CONSOLE  5
 #define SPECIALPAGE_IOREQ    6
-#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */
 #define special_pfn(x) (0xff000u - 1 - (x))
 
 #define VGA_HOLE_SIZE (0x20)
@@ -85,7 +85,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size)
+                           uint64_t mmio_start, uint64_t mmio_size,
+                           int max_emulators)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -113,7 +114,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES(max_emulators);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -256,6 +257,10 @@ static int setup_guest(xc_interface *xch,
         stat_1gb_pages = 0;
     int pod_mode = 0;
     int claim_enabled = args->claim_enabled;
+    int max_emulators = args->max_emulators;
+
+    if ( max_emulators < 1 )
+        goto error_out;
 
     if ( nr_pages > target_pages )
         pod_mode = XENMEMF_populate_on_demand;
@@ -468,12 +473,13 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
+    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
+                   max_emulators);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
 
-    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
+    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES(max_emulators));
     DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
             (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
     DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
@@ -486,10 +492,10 @@ static int setup_guest(xc_interface *xch,
             (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
     DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
             (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
-    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
+    DPRINTF("  IOREQ(%02d): %"PRI_xen_pfn"\n", max_emulators * 2,
             (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
 
-    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
         rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
@@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
     xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
                      special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ) - 1);
+                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                     max_emulators);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 13f816b..84cab13 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ server API
+ */
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id);
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *pfn,
+                                 xen_pfn_t *buf_pfn,
+                                 evtchn_port_t *buf_port);
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        int is_mmio,
+                                        uint64_t start,
+                                        uint64_t end);
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            int is_mmio,
+                                            uint64_t start);
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+                                      domid_t domid,
+                                      ioservid_t id,
+                                      uint16_t bdf);
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t bdf);
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
@@ -2428,3 +2469,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
 int xc_kexec_unload(xc_interface *xch, int type);
 
 #endif /* XENCTRL_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index a0e30e1..1300933 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -234,6 +234,8 @@ struct xc_hvm_build_args {
     struct xc_hvm_firmware_module smbios_module;
     /* Whether to use claim hypercall (1 - enable, 0 - disable). */
     int claim_enabled;
+    /* Maximum number of emulators for VM */
+    int max_emulators;
 };
 
 /**
@@ -306,3 +308,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
                       int prot,
                       unsigned long *mfn0);
 #endif /* XENGUEST_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..5170b7f 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,7 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 06bbca6..5a70b76 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -95,6 +95,14 @@
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
 /*
+ * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
+ * max_emulators field is present in the hvm sections of
+ * libxl_domain_build_info. This field can be used to reserve
+ * extra special pages for secondary device emulators.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a604cd8..cce93d9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        if (b_info->u.hvm.max_emulators < 1)
+            b_info->u.hvm.max_emulators = 1;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 55f74b2..9de06f9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
     args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
     args.claim_enabled = libxl_defbool_val(info->claim_mode);
+    args.max_emulators = info->u.hvm.max_emulators;
     if (libxl__domain_firmware(gc, info, &args)) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..b707159 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -372,6 +372,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("xen_platform_pci", libxl_defbool),
                                        ("usbdevice_list",   libxl_string_list),
                                        ("vendor_device",    libxl_vendor_device),
+                                       ("max_emulators",    integer),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4fc46eb..cf9b67d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1750,6 +1750,9 @@ skip_vfb:
 
             b_info->u.hvm.vendor_device = d;
         }
+ 
+        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
+            b_info->u.hvm.max_emulators = l + 1;
     }
 
     xlu_cfg_destroy(config);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index fb2dd73..e8b73fa 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
 bool_t hvm_io_pending(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
+    struct list_head *entry;
 
-    if ( !s )
-        return 0;
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
-    p = get_ioreq(s, v->vcpu_id);
-    return ( p->state != STATE_IOREQ_NONE );
+        p = get_ioreq(s, v->vcpu_id);
+        if ( p->state != STATE_IOREQ_NONE )
+            return 1;
+    }
+
+    return 0;
 }
 
 static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
@@ -394,18 +401,20 @@ static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct list_head *entry;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( s )
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
     {
-        ioreq_t *p = get_ioreq(s, v->vcpu_id);
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
 
-        hvm_wait_on_io(d, p);
+        hvm_wait_on_io(d, get_ioreq(s, v->vcpu_id));
     }
 
     /* Inject pending hw/sw trap */
@@ -543,6 +552,83 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( dir == IOREQ_WRITE )
+    {
+        switch ( bytes )
+        {
+        case 4:
+            hd->pci_cf8 = *val;
+            break;
+
+        case 2:
+        {
+            uint32_t mask = 0xffff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+            
+        case 1:
+        {
+            uint32_t mask = 0xff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+
+        default:
+            break;
+        }
+
+        /* We always need to fall through to the catch all emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        switch ( bytes )
+        {
+        case 4:
+            *val = hd->pci_cf8;
+            rc = X86EMUL_OKAY;
+            break;
+
+        case 2:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
+            rc = X86EMUL_OKAY;
+            break;
+            
+        case 1:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
+            rc = X86EMUL_OKAY;
+            break;
+
+        default:
+            rc = X86EMUL_UNHANDLEABLE;
+            break;
+        }
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -618,39 +704,53 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     }
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
 {
     struct hvm_ioreq_server *s;
     unsigned long pfn;
     struct vcpu *v;
     int i, rc;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
     rc = -EEXIST;
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
-        goto fail_exist;
+    list_for_each_entry ( s, 
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto fail_exist;
+    }
 
-    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     rc = -ENOMEM;
     s = xzalloc(struct hvm_ioreq_server);
     if ( !s )
         goto fail_alloc;
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
+    INIT_LIST_HEAD(&s->mmio_range_list);
+    INIT_LIST_HEAD(&s->portio_range_list);
+    INIT_LIST_HEAD(&s->pcidev_list);
 
     for ( i = 0; i < MAX_HVM_VCPUS; i++ )
         s->ioreq_evtchn[i] = -1;
     s->buf_ioreq_evtchn = -1;
 
     /* Initialize shared pages */
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 0);
     if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
         goto fail_set_ioreq;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 1);
     if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
@@ -664,10 +764,12 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
             goto fail_add_vcpu;
     }
 
-    d->arch.hvm_domain.ioreq_server = s;
+    list_add(&s->list_entry,
+             &d->arch.hvm_domain.ioreq_server_list);
 
     domain_unpause(d);
 
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return 0;
 
 fail_add_vcpu:
@@ -681,23 +783,33 @@ fail_set_ioreq:
     xfree(s);
 fail_alloc:
 fail_exist:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
     struct hvm_ioreq_server *s;
     struct vcpu *v;
 
-    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        return;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry)
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+found:
+    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     domain_pause(d);
 
-    d->arch.hvm_domain.ioreq_server = NULL;
+    list_del_init(&s->list_entry);
 
     for_each_vcpu ( d, v )
         hvm_ioreq_server_remove_vcpu(s, v);
@@ -708,31 +820,373 @@ static void hvm_destroy_ioreq_server(struct domain *d)
     hvm_destroy_ioreq_page(s, 0);
 
     xfree(s);
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 }
 
-static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
+static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t id,
+                                         evtchn_port_t *port)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct list_head *entry;
+    int rc;
 
-    if ( !s )
-        return -ENOENT;
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s->id == id )
+        {
+            *port = s->buf_ioreq_evtchn;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, int buf,
+                                    xen_pfn_t *pfn)
+{
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s->id == id )
+        {
+            int i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
+
+            *pfn = d->arch.hvm_domain.params[i] - s->id;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            int is_mmio, uint64_t start, uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_io_range *x;
+    struct list_head *list;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_io_range);
+    if ( x == NULL )
+        return -ENOMEM;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->start = start;
+    x->end = end;
+
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+    list_add_rcu(&x->list_entry, list);
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
+             d->domain_id,
+             s->id,
+             ( is_mmio ) ? "MMIO" : "PORTIO",
+             x->start,
+             x->end);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    *port = s->buf_ioreq_evtchn;
     return 0;
+
+fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
 }
 
-static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
+static void free_io_range(struct rcu_head *rcu)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    int i;
+    struct hvm_io_range *x;
 
-    if ( !s )
-        return -ENOENT;
+    x = container_of (rcu, struct hvm_io_range, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                                int is_mmio, uint64_t start)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *list, *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+found:
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+
+    list_for_each ( entry,
+                    list )
+    {
+        struct hvm_io_range *x = list_entry(entry,
+                                            struct hvm_io_range,
+                                            list_entry);
+
+        if ( start == x->start )
+        {
+            gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
+                     d->domain_id,
+                     s->id,
+                     ( is_mmio ) ? "MMIO" : "PORTIO",
+                     x->start,
+                     x->end);
+
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_io_range);
 
-    i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
-    *pfn = d->arch.hvm_domain.params[i];
+            rc = 0;
+            break;
+        }
+    }
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
+                                          uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_pcidev *x;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_pcidev);
+    if ( x == NULL )
+        return -ENOMEM;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->bdf = bdf;
+
+    list_add_rcu(&x->list_entry, &s->pcidev_list);
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
+             d->domain_id,
+             s->id,
+             x->bdf);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
     return 0;
+
+fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void free_pcidev(struct rcu_head *rcu)
+{
+    struct hvm_pcidev *x;
+
+    x = container_of (rcu, struct hvm_pcidev, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
+                                              uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+found:
+    list_for_each ( entry,
+                    &s->pcidev_list )
+    {
+        struct hvm_pcidev *x = list_entry(entry,
+                                          struct hvm_pcidev,
+                                          list_entry);
+
+        if ( bdf == x->bdf )
+        {
+            gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
+                     d->domain_id,
+                     s->id,
+                     x->bdf);
+
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_pcidev);
+
+            rc = 0;
+            break;
+        }
+    }
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
+            goto fail;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+fail:
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    ioservid_t id;
+
+    for ( id = 0;
+          id < d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+        hvm_destroy_ioreq_server(d, id);
 }
 
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
@@ -750,18 +1204,31 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     struct vcpu *v;
     int rc = 0;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
     domain_pause(d);
 
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
     rc = -ENOENT;
-    if ( !s )
-        goto done;
+    goto done;
 
+found:
     rc = 0;
     if ( s->domid == domid )
         goto done;
@@ -787,6 +1254,8 @@ static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
 done:
     domain_unpause(d);
 
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     return rc;
 }
 
@@ -817,6 +1286,9 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
+    spin_lock_init(&d->arch.hvm_domain.pci_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -858,6 +1330,7 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -888,7 +1361,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1520,7 +1993,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1563,12 +2035,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    if ( s )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc < 0 )
-            goto fail6;
-    }
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
+    if ( rc < 0 )
+        goto fail6;
 
     if ( v->vcpu_id == 0 )
     {
@@ -1604,10 +2073,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1646,11 +2113,112 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
+static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
+
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v, ioreq_t *p)
+{
+#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
+
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc )
+    { 
+        /* PCI config data cycle */
+        type = IOREQ_TYPE_PCI_CONFIG;
+
+        spin_lock(&d->arch.hvm_domain.pci_lock);
+        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
+        spin_unlock(&d->arch.hvm_domain.pci_lock);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    rcu_read_lock(&ioreq_server_rcu_lock);
+
+    switch ( type )
+    {
+    case IOREQ_TYPE_COPY:
+    case IOREQ_TYPE_PIO:
+    case IOREQ_TYPE_PCI_CONFIG:
+        break;
+    default:
+        goto done;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        switch ( type )
+        {
+            case IOREQ_TYPE_COPY:
+            case IOREQ_TYPE_PIO: {
+                struct list_head *list;
+                struct hvm_io_range *x;
+
+                list = ( type == IOREQ_TYPE_COPY ) ?
+                    &s->mmio_range_list :
+                    &s->portio_range_list;
+
+                list_for_each_entry ( x,
+                                      list,
+                                      list_entry )
+                {
+                    if ( (addr >= x->start) && (addr <= x->end) )
+                        goto found;
+                }
+                break;
+            }
+            case IOREQ_TYPE_PCI_CONFIG: {
+                struct hvm_pcidev *x;
+
+                list_for_each_entry ( x,
+                                      &s->pcidev_list,
+                                      list_entry )
+                {
+                    if ( BDF(addr) == x->bdf ) {
+                        p->type = type;
+                        p->addr = addr;
+                        goto found;
+                    }
+                }
+                break;
+            }
+        }
+    }
+
+done:
+    /* The catch-all server has id 0 */
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == 0 )
+            goto found;
+    }
+
+    s = NULL;
+
+found:
+    rcu_read_unlock(&ioreq_server_rcu_lock);
+
+    return s;
+
+#undef BDF
+}
+
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     struct hvm_ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp;
@@ -1660,6 +2228,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    s = hvm_select_ioreq_server(v, p);
     if ( !s )
         return 0;
 
@@ -1770,18 +2339,34 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
 {
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0;
 
+    s = hvm_select_ioreq_server(v, p);
     if ( !s )
         return 0;
 
     return hvm_send_assist_req_to_server(s, v, p);
 }
 
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct list_head *entry;
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        (void) hvm_send_assist_req_to_server(s, v, p);
+    }
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4370,6 +4955,215 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    ioservid_t id;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = -ENOSPC;
+    for ( id = 1;
+          id <  d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+    {
+        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
+        if ( rc == -EEXIST )
+            continue;
+
+        break;
+    }
+
+    if ( rc == -EEXIST )
+        rc = -ENOSPC;
+
+    if ( rc < 0 )
+        goto out;
+
+    op.id = id;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_io_range_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
+                                          op.start, op.end);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_io_range_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
+                                              op.start);
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_pcidev_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pcidev_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_pcidev_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    hvm_destroy_ioreq_server(d, op.id);
+    rc = 0;
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4378,6 +5172,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_map_pcidev_to_ioreq_server:
+        rc = hvmop_map_pcidev_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_pcidev_from_ioreq_server:
+        rc = hvmop_unmap_pcidev_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4466,9 +5295,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_create_ioreq_server(d, a.value);
+                rc = hvm_create_ioreq_server(d, 0, a.value);
                 if ( rc == -EEXIST )
-                    rc = hvm_set_ioreq_server_domid(d, a.value);
+                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4533,6 +5362,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_NR_IOREQ_SERVERS:
+                if ( d == current->domain )
+                    rc = -EPERM;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4567,7 +5400,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN:
                 /* May need to create server */
-                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
 
@@ -4576,7 +5409,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_IOREQ_PFN: {
                     xen_pfn_t pfn;
 
-                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
                         goto param_fail;
 
                     a.value = pfn;
@@ -4585,7 +5418,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_BUFIOREQ_PFN: {
                     xen_pfn_t pfn;
 
-                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
                         goto param_fail;
 
                     a.value = pfn;
@@ -4594,7 +5427,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_BUFIOREQ_EVTCHN: {
                     evtchn_port_t port;
 
-                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
                         goto param_fail;
 
                     a.value = port;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index c9adb94..ac0d867 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -75,7 +75,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    (void)hvm_send_assist_req(v, &p);
+    hvm_broadcast_assist_req(v, &p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index a77b83d..e9da543 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,17 +41,38 @@ struct hvm_ioreq_page {
     void *va;
 };
 
+struct hvm_io_range {
+    struct list_head    list_entry;
+    uint64_t            start, end;
+    struct rcu_head     rcu;
+};	
+
+struct hvm_pcidev {
+    struct list_head    list_entry;
+    uint16_t            bdf;
+    struct rcu_head     rcu;
+};	
+
 struct hvm_ioreq_server {
+    struct list_head       list_entry;
+    ioservid_t             id;
     struct domain          *domain;
     domid_t                domid;
     struct hvm_ioreq_page  ioreq;
     int                    ioreq_evtchn[MAX_HVM_VCPUS];
     struct hvm_ioreq_page  buf_ioreq;
     int                    buf_ioreq_evtchn;
+    struct list_head       mmio_range_list;
+    struct list_head       portio_range_list;
+    struct list_head       pcidev_list;
 };
 
 struct hvm_domain {
-    struct hvm_ioreq_server *ioreq_server;
+    struct list_head        ioreq_server_list;
+    spinlock_t              ioreq_server_lock;
+    uint32_t                pci_cf8;
+    spinlock_t              pci_lock;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 40aeddf..4118669 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -229,6 +229,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a9aab4b..6b31189 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+typedef uint32_t ioservid_t;
+
+DEFINE_XEN_GUEST_HANDLE(ioservid_t);
+
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid;  /* IN - domain to be serviced */
+    ioservid_t id;  /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+    xen_pfn_t pfn;          /* OUT - ioreq pfn */
+    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
+    evtchn_port_t buf_port; /* OUT - buf ioreq port */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+#define HVMOP_map_io_range_to_ioreq_server 19
+struct xen_hvm_map_io_range_to_ioreq_server {
+    domid_t domid;                  /* IN - domain to be serviced */
+    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
+    int is_mmio;                    /* IN - MMIO or port IO? */
+    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
+
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_unmap_io_range_from_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
+    uint8_t is_mmio;        /* IN - MMIO or port IO? */
+    uint64_aligned_t start; /* IN - start address of the range to remove */
+};
+typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
+
+#define HVMOP_map_pcidev_to_ioreq_server 21
+struct xen_hvm_map_pcidev_to_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
+
+#define HVMOP_unmap_pcidev_from_ioreq_server 22
+struct xen_hvm_unmap_pcidev_from_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
+
+#define HVMOP_destroy_ioreq_server 23
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..e84fa75 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -34,6 +34,7 @@
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..4109b11 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,8 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_NR_IOREQ_SERVERS 32
+
+#define HVM_NR_PARAMS          33
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-03-04 11:40 ` [PATCH v2 5/6] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-03-04 11:40 ` Paul Durrant
  5 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 11:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Because we may now have more than one emulator, the implementation of the
PCI hotplug controller needs to be done by Xen. Happily the code is very
short and simple and it also removes the need for a different ACPI DSDT
when using different variants of QEMU.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/firmware/hvmloader/acpi/mk_dsdt.c |  147 ++++------------------
 tools/libxc/xc_domain.c                 |   46 +++++++
 tools/libxc/xenctrl.h                   |   11 ++
 tools/libxl/libxl_pci.c                 |   15 +++
 xen/arch/x86/hvm/Makefile               |    1 +
 xen/arch/x86/hvm/hotplug.c              |  207 +++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  |   40 +++++-
 xen/include/asm-x86/hvm/domain.h        |   12 ++
 xen/include/asm-x86/hvm/io.h            |    6 +
 xen/include/public/hvm/hvm_op.h         |    9 ++
 xen/include/public/hvm/ioreq.h          |    2 +
 11 files changed, 373 insertions(+), 123 deletions(-)
 create mode 100644 xen/arch/x86/hvm/hotplug.c

diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
index a4b693b..6408b44 100644
--- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
+++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
@@ -58,28 +58,6 @@ static void pop_block(void)
     printf("}\n");
 }
 
-static void pci_hotplug_notify(unsigned int slt)
-{
-    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
-}
-
-static void decision_tree(
-    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
-{
-    if ( s == (e-1) )
-    {
-        (*leaf)(s);
-        return;
-    }
-
-    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
-    decision_tree((s+e)/2, e, var, leaf);
-    pop_block();
-    push_block("Else", NULL);
-    decision_tree(s, (s+e)/2, var, leaf);
-    pop_block();
-}
-
 static struct option options[] = {
     { "maxcpu", 1, 0, 'c' },
     { "dm-version", 1, 0, 'q' },
@@ -322,64 +300,21 @@ int main(int argc, char **argv)
                    dev, intx, ((dev*4+dev/8+intx)&31)+16);
     printf("})\n");
 
-    /*
-     * Each PCI hotplug slot needs at least two methods to handle
-     * the ACPI event:
-     *  _EJ0: eject a device
-     *  _STA: return a device's status, e.g. enabled or removed
-     * 
-     * Eject button would generate a general-purpose event, then the
-     * control method for this event uses Notify() to inform OSPM which
-     * action happened and on which device.
-     *
-     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
-     * in ACPI spec 3.0b for details.
-     *
-     * QEMU provides a simple hotplug controller with some I/O to handle
-     * the hotplug action and status, which is beyond the ACPI scope.
-     */
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        for ( slot = 0; slot < 0x100; slot++ )
-        {
-            push_block("Device", "S%02X", slot);
-            /* _ADR == dev:fn (16:16) */
-            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
-            /* _SUN == dev */
-            stmt("Name", "_SUN, 0x%08x", slot >> 3);
-            push_block("Method", "_EJ0, 1");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x88, \\_GPE.DPT2");
-            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
-                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
-            pop_block();
-            push_block("Method", "_STA, 0");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x89, \\_GPE.DPT2");
-            if ( slot & 1 )
-                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-            else
-                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
-            stmt("Return", "Local1"); /* IN status as the _STA */
-            pop_block();
-            pop_block();
-        }
-    } else {
-        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
-        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("B0EJ, 32,\n");
-        pop_block();
+    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
+    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("B0EJ, 32,\n");
+    pop_block();
 
-        /* hotplug_slot */
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("Device", "S%i", slot); {
-                stmt("Name", "_ADR, %#06x0000", slot);
-                push_block("Method", "_EJ0,1"); {
-                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
-                    stmt("Return", "0x0");
-                } pop_block();
-                stmt("Name", "_SUN, %i", slot);
+    /* hotplug_slot */
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("Device", "S%i", slot); {
+            stmt("Name", "_ADR, %#06x0000", slot);
+            push_block("Method", "_EJ0,1"); {
+                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
+                stmt("Return", "0x0");
             } pop_block();
-        }
+            stmt("Name", "_SUN, %i", slot);
+        } pop_block();
     }
 
     pop_block();
@@ -389,26 +324,11 @@ int main(int argc, char **argv)
     /**** GPE start ****/
     push_block("Scope", "\\_GPE");
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
-
-        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
-        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
-        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
-        for ( slot = 0; slot < 0x100; slot += 2 )
-        {
-            indent();
-            /* Each hotplug control register manages a pair of pci functions. */
-            printf("PH%02X, 8,\n", slot);
-        }
-        pop_block();
-    } else {
-        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
-        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("PCIU, 32,\n");
-        indent(); printf("PCID, 32,\n");
-        pop_block();
-    }
+    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
+    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("PCIU, 32,\n");
+    indent(); printf("PCID, 32,\n");
+    pop_block();
 
     stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
 
@@ -416,33 +336,16 @@ int main(int argc, char **argv)
     indent(); printf("DPT1, 8, DPT2, 8\n");
     pop_block();
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L03, 0, Serialized");
-        /* Detect slot and event (remove/add). */
-        stmt("Name", "SLT, 0x0");
-        stmt("Name", "EVT, 0x0");
-        stmt("Store", "PSTA, Local1");
-        stmt("And", "Local1, 0xf, EVT");
-        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
-        stmt("And", "Local1, 0xff, SLT");
-        /* Debug */
-        stmt("Store", "SLT, DPT1");
-        stmt("Store", "EVT, DPT2");
-        /* Decision tree */
-        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
+    push_block("Method", "_E01");
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
         pop_block();
-    } else {
-        push_block("Method", "_E01");
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
-            pop_block();
-            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
-            pop_block();
-        }
+        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
         pop_block();
     }
+    pop_block();
 
     pop_block();
     /**** GPE end ****/
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index dfa905b..5b49316 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1459,6 +1459,52 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+                              domid_t domid,
+                              uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 1;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+                               domid_t domid,
+                               uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 0;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 84cab13..b9c9849 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1842,6 +1842,17 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id);
 
+/*
+ * PCI hotplug API
+ */
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+                              domid_t domid,
+                              uint32_t slot);
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+                               domid_t domid,
+                               uint32_t slot);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2e52470..4176440 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         if ( rc )
             return ERROR_FAIL;
+
+        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug_enable failed");
+            return ERROR_FAIL;
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
@@ -1182,6 +1189,14 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
                                          NULL, NULL, NULL) < 0)
             goto out_fail;
 
+        rc = xc_hvm_pci_hotplug_disable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                             "Error: xc_hvm_pci_hotplug_disable failed");
+            rc = ERROR_FAIL;
+            goto out_fail;
+        }
+
         switch (libxl__device_model_version_running(gc, domid)) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
             rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..48efddb 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -3,6 +3,7 @@ subdir-y += vmx
 
 obj-y += asid.o
 obj-y += emulate.o
+obj-y += hotplug.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += i8254.o
diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
new file mode 100644
index 0000000..cd3a5b3
--- /dev/null
+++ b/xen/arch/x86/hvm/hotplug.c
@@ -0,0 +1,207 @@
+/*
+ * hvm/hotplug.c
+ *
+ * Copyright (c) 2014, Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/xmalloc.h>
+#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+
+#define SCI_IRQ 9
+
+#define GPE_BASE            (ACPI_GPE0_BLK_ADDRESS_V1)
+#define GPE_LEN             (ACPI_GPE0_BLK_LEN_V1)
+
+#define GPE_PCI_HOTPLUG_STATUS  2
+
+#define PCI_HOTPLUG_BASE    (ACPI_PCI_HOTPLUG_ADDRESS_V1)
+#define PCI_HOTPLUG_LEN     (ACPI_PCI_HOTPLUG_LEN_V1)
+
+#define PCI_UP      0
+#define PCI_DOWN    4
+#define PCI_EJECT   8
+
+static void gpe_update_sci(struct hvm_hotplug *hp)
+{
+    if ( (hp->gpe_sts[0] & hp->gpe_en[0]) & GPE_PCI_HOTPLUG_STATUS )
+        hvm_isa_irq_assert(hp->domain, SCI_IRQ);
+    else
+        hvm_isa_irq_deassert(hp->domain, SCI_IRQ);
+}
+
+static int handle_gpe_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= GPE_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        if ( port < GPE_LEN / 2 )
+        {
+            *val = hp->gpe_sts[port];
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            *val = hp->gpe_en[port];
+        }
+    } else {
+        if ( port < GPE_LEN / 2 )
+        {
+            hp->gpe_sts[port] &= ~*val;
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            hp->gpe_en[port] = *val;
+        }
+
+        gpe_update_sci(hp);
+    }
+
+done:
+    return X86EMUL_OKAY;
+}
+
+static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
+{
+    int slot = ffs(mask) - 1;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
+
+    hp->slot_down &= ~(1u  << slot);
+    hp->slot_up &= ~(1u  << slot);
+}
+
+static int handle_pci_hotplug_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= PCI_HOTPLUG_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        switch ( port )
+        {
+        case PCI_UP:
+            *val = hp->slot_up;
+            break;
+        case PCI_DOWN:
+            *val = hp->slot_down;
+            break;
+        default:
+            break;
+        }
+    }
+    else
+    {   
+        switch ( port )
+        {
+        case PCI_EJECT:
+            pci_hotplug_eject(hp, *val);
+            break;
+        default:
+            break;
+        }
+    }
+
+done:
+    return X86EMUL_OKAY;
+}
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    gdprintk(XENLOG_INFO, "%s: %s %d\n", __func__,
+             ( enable ) ? "enable" : "disable", slot);
+
+    if ( enable )
+        hp->slot_up |= (1u << slot);
+    else
+        hp->slot_down |= (1u << slot);
+
+    hp->gpe_sts[0] |= GPE_PCI_HOTPLUG_STATUS;
+    gpe_update_sci(hp);
+}
+
+int gpe_init(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    hp->domain = d;
+
+    hp->gpe_sts = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_sts == NULL )
+        goto fail1;
+
+    hp->gpe_en = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_en == NULL )
+        goto fail2;
+
+    register_portio_handler(d, GPE_BASE, GPE_LEN, handle_gpe_io);
+    register_portio_handler(d, PCI_HOTPLUG_BASE, PCI_HOTPLUG_LEN,
+                            handle_pci_hotplug_io);
+
+    return 0;
+
+fail2:
+    xfree(hp->gpe_sts);
+
+fail1:
+    return -ENOMEM;
+}
+
+void gpe_deinit(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    xfree(hp->gpe_en);
+    xfree(hp->gpe_sts);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * c-tab-always-indent: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e8b73fa..7e80dd7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1329,15 +1329,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
+    rc = gpe_init(d);
+    if ( rc != 0 )
+        goto fail2;
+
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    gpe_deinit(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1383,6 +1389,7 @@ void hvm_domain_destroy(struct domain *d)
         return;
 
     hvm_funcs.domain_destroy(d);
+    gpe_deinit(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
@@ -5164,6 +5171,32 @@ out:
     return rc;
 }
 
+static int hvmop_pci_hotplug(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
+{
+    xen_hvm_pci_hotplug_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    pci_hotplug(d, op.slot, op.enable);
+    rc = 0;
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -5207,6 +5240,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
         break;
     
+    case HVMOP_pci_hotplug:
+        rc = hvmop_pci_hotplug(
+            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
+        break;
+
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index e9da543..178e64c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -67,12 +67,24 @@ struct hvm_ioreq_server {
     struct list_head       pcidev_list;
 };
 
+struct hvm_hotplug {
+    struct domain   *domain;
+    uint8_t         *gpe_sts;
+    uint8_t         *gpe_en;
+
+    /* PCI hotplug */
+    uint32_t        slot_up;
+    uint32_t        slot_down;
+};
+
 struct hvm_domain {
     struct list_head        ioreq_server_list;
     spinlock_t              ioreq_server_lock;
     uint32_t                pci_cf8;
     spinlock_t              pci_lock;
 
+    struct hvm_hotplug      hotplug;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 86db58d..072bfe7 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+int gpe_init(struct domain *d);
+void gpe_deinit(struct domain *d);
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 6b31189..20a53ab 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -340,6 +340,15 @@ struct xen_hvm_destroy_ioreq_server {
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+#define HVMOP_pci_hotplug 24
+struct xen_hvm_pci_hotplug {
+    domid_t domid;          /* IN - domain to be serviced */
+    uint8_t enable;         /* IN - enable or disable? */
+    uint32_t slot;          /* IN - slot to enable/disable */
+};
+typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index e84fa75..40bfa61 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
 #define ACPI_GPE0_BLK_LEN_V1         0x04
+#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
+#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
 
 /* Compatibility definitions for the default location (version 0). */
 #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-04 11:40 ` [PATCH v2 5/6] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-03-04 12:06   ` Andrew Cooper
  2014-03-05 14:44     ` Paul Durrant
  2014-03-10 18:41   ` George Dunlap
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2014-03-04 12:06 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 04/03/14 11:40, Paul Durrant wrote:
> The legacy 'catch-all' server is always created with id 0. Secondary
> servers will have an id ranging from 1 to a limit set by the toolstack
> via the 'max_emulators' build info field. This defaults to 1 so ordinarily
> no extra special pages are reserved for secondary emulators. It may be
> increased using the secondary_device_emulators parameter in xl.cfg(5).
> There's no clear limit to apply to the number of emulators so I've not
> applied one.
>
> Because of the re-arrangement of the special pages in a previous patch we
> only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
> the layout of the shared pages for multiple emulators. Guests migrated in
> from hosts without this patch will be lacking the save record which stores
> the new parameter and so the guest is assumed to only have had a single
> emulator.
>
> Added some more emacs boilerplate to xenctrl.h and xenguest.h
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

How does the build param interact with the hvmparam?  It appears not to.

On migrate, the receiving side will have to know, out-of-band, what to
set max_emulators to when building the domain.  The setparam code needs
to validate the hvmparam against the build param and return
-EINVAL/-E2BIG in the case that the hvmparam is too large.
xc_domain_restore() needs to detect this and abort the migration if the
guest can't be restored with the expected number of emulators.

~Andrew

> ---
>  docs/man/xl.cfg.pod.5            |    7 +
>  tools/libxc/xc_domain.c          |  175 +++++++
>  tools/libxc/xc_domain_restore.c  |   20 +
>  tools/libxc/xc_domain_save.c     |   12 +
>  tools/libxc/xc_hvm_build_x86.c   |   24 +-
>  tools/libxc/xenctrl.h            |   51 ++
>  tools/libxc/xenguest.h           |   12 +
>  tools/libxc/xg_save_restore.h    |    1 +
>  tools/libxl/libxl.h              |    8 +
>  tools/libxl/libxl_create.c       |    3 +
>  tools/libxl/libxl_dom.c          |    1 +
>  tools/libxl/libxl_types.idl      |    1 +
>  tools/libxl/xl_cmdimpl.c         |    3 +
>  xen/arch/x86/hvm/hvm.c           |  951 +++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/io.c            |    2 +-
>  xen/include/asm-x86/hvm/domain.h |   23 +-
>  xen/include/asm-x86/hvm/hvm.h    |    1 +
>  xen/include/public/hvm/hvm_op.h  |   70 +++
>  xen/include/public/hvm/ioreq.h   |    1 +
>  xen/include/public/hvm/params.h  |    4 +-
>  20 files changed, 1300 insertions(+), 70 deletions(-)
>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index e15a49f..0226c55 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
>  This parameter only takes effect when device_model_version=qemu-xen.
>  See F<docs/misc/pci-device-reservations.txt> for more information.
>  
> +=item B<secondary_device_emulators=NUMBER>
> +
> +If a number of secondary device emulators (i.e. in addition to
> +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> +guest then this parameter can be set with the count of how many are
> +to be used. The default value is zero.
> +
>  =back
>  
>  =head2 Device-Model Options
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..dfa905b 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>      return rc;
>  }
>  
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    *id = arg->id;
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *pfn,
> +                                 xen_pfn_t *buf_pfn,
> +                                 evtchn_port_t *buf_port)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    if ( rc != 0 )
> +        goto done;
> +
> +    if ( pfn )
> +        *pfn = arg->pfn;
> +
> +    if ( buf_pfn )
> +        *buf_pfn = arg->buf_pfn;
> +
> +    if ( buf_port )
> +        *buf_port = arg->buf_port;
> +
> +done:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                        ioservid_t id, int is_mmio,
> +                                        uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    arg->end = end;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                            ioservid_t id, int is_mmio,
> +                                            uint64_t start)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                      ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                          ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> +                                domid_t domid,
> +                                ioservid_t id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_domain_setdebugging(xc_interface *xch,
>                             uint32_t domid,
>                             unsigned int enable)
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index 1f6ce50..3116653 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -746,6 +746,7 @@ typedef struct {
>      uint64_t acpi_ioport_location;
>      uint64_t viridian;
>      uint64_t vm_generationid_addr;
> +    uint64_t nr_ioreq_servers;
>  
>      struct toolstack_data_t tdata;
>  } pagebuf_t;
> @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
>          DPRINTF("read generation id buffer address");
>          return pagebuf_get_one(xch, ctx, buf, fd, dom);
>  
> +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> +        /* Skip padding 4 bytes then read the acpi ioport location. */
> +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> +        {
> +            PERROR("error reading the number of IOREQ servers");
> +            return -1;
> +        }
> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> +
>      default:
>          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>              ERROR("Max batch size exceeded (%d). Giving up.", count);
> @@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>      if (pagebuf.viridian != 0)
>          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
>  
> +    if ( hvm ) {
> +        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
> +
> +        if ( nr_ioreq_servers == 0 )
> +            nr_ioreq_servers = 1;
> +
> +        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
> +    }
> +
>      if (pagebuf.acpi_ioport_location == 1) {
>          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 42c4752..3293e29 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>              PERROR("Error when writing the viridian flag");
>              goto out;
>          }
> +
> +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> +        chunk.data = 0;
> +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                         (unsigned long *)&chunk.data);
> +
> +        if ( (chunk.data != 0) &&
> +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> +        {
> +            PERROR("Error when writing the number of IOREQ servers");
> +            goto out;
> +        }
>      }
>  
>      if ( callbacks != NULL && callbacks->toolstack_save != NULL )
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index b65e702..6d6328a 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -45,7 +45,7 @@
>  #define SPECIALPAGE_IDENT_PT 4
>  #define SPECIALPAGE_CONSOLE  5
>  #define SPECIALPAGE_IOREQ    6
> -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
> +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */
>  #define special_pfn(x) (0xff000u - 1 - (x))
>  
>  #define VGA_HOLE_SIZE (0x20)
> @@ -85,7 +85,8 @@ static int modules_init(struct xc_hvm_build_args *args,
>  }
>  
>  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> +                           uint64_t mmio_start, uint64_t mmio_size,
> +                           int max_emulators)
>  {
>      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
>          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> @@ -113,7 +114,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      /* Memory parameters. */
>      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> -    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
> +    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES(max_emulators);
>  
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> @@ -256,6 +257,10 @@ static int setup_guest(xc_interface *xch,
>          stat_1gb_pages = 0;
>      int pod_mode = 0;
>      int claim_enabled = args->claim_enabled;
> +    int max_emulators = args->max_emulators;
> +
> +    if ( max_emulators < 1 )
> +        goto error_out;
>  
>      if ( nr_pages > target_pages )
>          pod_mode = XENMEMF_populate_on_demand;
> @@ -468,12 +473,13 @@ static int setup_guest(xc_interface *xch,
>                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                HVM_INFO_PFN)) == NULL )
>          goto error_out;
> -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
> +                   max_emulators);
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
>  
> -    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> +    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES(max_emulators));
>      DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
>              (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
>      DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> @@ -486,10 +492,10 @@ static int setup_guest(xc_interface *xch,
>              (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
>      DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
>              (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> -    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> +    DPRINTF("  IOREQ(%02d): %"PRI_xen_pfn"\n", max_emulators * 2,
>              (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
>  
> -    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> +    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
>      {
>          xen_pfn_t pfn = special_pfn(i);
>          rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> @@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
>      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
>                       special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                     max_emulators);
>  
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 13f816b..84cab13 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
>  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
>  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
>  
> +/*
> + * IOREQ server API
> + */
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id);
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *pfn,
> +                                 xen_pfn_t *buf_pfn,
> +                                 evtchn_port_t *buf_port);
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
> +                                        domid_t domid,
> +                                        ioservid_t id,
> +                                        int is_mmio,
> +                                        uint64_t start,
> +                                        uint64_t end);
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> +                                            domid_t domid,
> +                                            ioservid_t id,
> +                                            int is_mmio,
> +                                            uint64_t start);
> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
> +                                      domid_t domid,
> +                                      ioservid_t id,
> +                                      uint16_t bdf);
> +
> +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> +                                          domid_t domid,
> +                                          ioservid_t id,
> +                                          uint16_t bdf);
> +
> +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> +                                domid_t domid,
> +                                ioservid_t id);
> +
>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> @@ -2428,3 +2469,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
>  int xc_kexec_unload(xc_interface *xch, int type);
>  
>  #endif /* XENCTRL_H */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> index a0e30e1..1300933 100644
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -234,6 +234,8 @@ struct xc_hvm_build_args {
>      struct xc_hvm_firmware_module smbios_module;
>      /* Whether to use claim hypercall (1 - enable, 0 - disable). */
>      int claim_enabled;
> +    /* Maximum number of emulators for VM */
> +    int max_emulators;
>  };
>  
>  /**
> @@ -306,3 +308,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
>                        int prot,
>                        unsigned long *mfn0);
>  #endif /* XENGUEST_H */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> index f859621..5170b7f 100644
> --- a/tools/libxc/xg_save_restore.h
> +++ b/tools/libxc/xg_save_restore.h
> @@ -259,6 +259,7 @@
>  #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
>  #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
>  #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
> +#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
>  
>  /*
>  ** We process save/restore/migrate in batches of pages; the below
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 06bbca6..5a70b76 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -95,6 +95,14 @@
>  #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
>  
>  /*
> + * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
> + * max_emulators field is present in the hvm sections of
> + * libxl_domain_build_info. This field can be used to reserve
> + * extra special pages for secondary device emulators.
> + */
> +#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
> +
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index a604cd8..cce93d9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>  
>          libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
>  
> +        if (b_info->u.hvm.max_emulators < 1)
> +            b_info->u.hvm.max_emulators = 1;
> +
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
>          libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 55f74b2..9de06f9 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
>      args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
>      args.claim_enabled = libxl_defbool_val(info->claim_mode);
> +    args.max_emulators = info->u.hvm.max_emulators;
>      if (libxl__domain_firmware(gc, info, &args)) {
>          LOG(ERROR, "initializing domain firmware failed");
>          goto out;
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 649ce50..b707159 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -372,6 +372,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                         ("xen_platform_pci", libxl_defbool),
>                                         ("usbdevice_list",   libxl_string_list),
>                                         ("vendor_device",    libxl_vendor_device),
> +                                       ("max_emulators",    integer),
>                                         ])),
>                   ("pv", Struct(None, [("kernel", string),
>                                        ("slack_memkb", MemKB),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4fc46eb..cf9b67d 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1750,6 +1750,9 @@ skip_vfb:
>  
>              b_info->u.hvm.vendor_device = d;
>          }
> + 
> +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> +            b_info->u.hvm.max_emulators = l + 1;
>      }
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index fb2dd73..e8b73fa 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
>  bool_t hvm_io_pending(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
> +    struct list_head *entry;
>  
> -    if ( !s )
> -        return 0;
> +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
>  
> -    p = get_ioreq(s, v->vcpu_id);
> -    return ( p->state != STATE_IOREQ_NONE );
> +        p = get_ioreq(s, v->vcpu_id);
> +        if ( p->state != STATE_IOREQ_NONE )
> +            return 1;
> +    }
> +
> +    return 0;
>  }
>  
>  static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> @@ -394,18 +401,20 @@ static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>  void hvm_do_resume(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct list_head *entry;
>  
>      check_wakeup_from_wait();
>  
>      if ( is_hvm_vcpu(v) )
>          pt_restore_timer(v);
>  
> -    if ( s )
> +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
>      {
> -        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
>  
> -        hvm_wait_on_io(d, p);
> +        hvm_wait_on_io(d, get_ioreq(s, v->vcpu_id));
>      }
>  
>      /* Inject pending hw/sw trap */
> @@ -543,6 +552,83 @@ static int hvm_print_line(
>      return X86EMUL_OKAY;
>  }
>  
> +static int hvm_access_cf8(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *curr = current;
> +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> +    int rc;
> +
> +    BUG_ON(port < 0xcf8);
> +    port -= 0xcf8;
> +
> +    spin_lock(&hd->pci_lock);
> +
> +    if ( dir == IOREQ_WRITE )
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            hd->pci_cf8 = *val;
> +            break;
> +
> +        case 2:
> +        {
> +            uint32_t mask = 0xffff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +            
> +        case 1:
> +        {
> +            uint32_t mask = 0xff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +
> +        default:
> +            break;
> +        }
> +
> +        /* We always need to fall through to the catch all emulator */
> +        rc = X86EMUL_UNHANDLEABLE;
> +    }
> +    else
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            *val = hd->pci_cf8;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        case 2:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +            
> +        case 1:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        default:
> +            rc = X86EMUL_UNHANDLEABLE;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&hd->pci_lock);
> +
> +    return rc;
> +}
> +
>  static int handle_pvh_io(
>      int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>  {
> @@ -618,39 +704,53 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
>      }
>  }
>  
> -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
>  {
>      struct hvm_ioreq_server *s;
>      unsigned long pfn;
>      struct vcpu *v;
>      int i, rc;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
>      rc = -EEXIST;
> -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> -        goto fail_exist;
> +    list_for_each_entry ( s, 
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto fail_exist;
> +    }
>  
> -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
>  
>      rc = -ENOMEM;
>      s = xzalloc(struct hvm_ioreq_server);
>      if ( !s )
>          goto fail_alloc;
>  
> +    s->id = id;
>      s->domain = d;
>      s->domid = domid;
> +    INIT_LIST_HEAD(&s->mmio_range_list);
> +    INIT_LIST_HEAD(&s->portio_range_list);
> +    INIT_LIST_HEAD(&s->pcidev_list);
>  
>      for ( i = 0; i < MAX_HVM_VCPUS; i++ )
>          s->ioreq_evtchn[i] = -1;
>      s->buf_ioreq_evtchn = -1;
>  
>      /* Initialize shared pages */
> -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
>  
>      hvm_init_ioreq_page(s, 0);
>      if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
>          goto fail_set_ioreq;
>  
> -    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
>  
>      hvm_init_ioreq_page(s, 1);
>      if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
> @@ -664,10 +764,12 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>              goto fail_add_vcpu;
>      }
>  
> -    d->arch.hvm_domain.ioreq_server = s;
> +    list_add(&s->list_entry,
> +             &d->arch.hvm_domain.ioreq_server_list);
>  
>      domain_unpause(d);
>  
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>      return 0;
>  
>  fail_add_vcpu:
> @@ -681,23 +783,33 @@ fail_set_ioreq:
>      xfree(s);
>  fail_alloc:
>  fail_exist:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>      return rc;
>  }
>  
> -static void hvm_destroy_ioreq_server(struct domain *d)
> +static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>  {
>      struct hvm_ioreq_server *s;
>      struct vcpu *v;
>  
> -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    s = d->arch.hvm_domain.ioreq_server;
> -    if ( !s )
> -        return;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry)
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto done;
> +
> +found:
> +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
>  
>      domain_pause(d);
>  
> -    d->arch.hvm_domain.ioreq_server = NULL;
> +    list_del_init(&s->list_entry);
>  
>      for_each_vcpu ( d, v )
>          hvm_ioreq_server_remove_vcpu(s, v);
> @@ -708,31 +820,373 @@ static void hvm_destroy_ioreq_server(struct domain *d)
>      hvm_destroy_ioreq_page(s, 0);
>  
>      xfree(s);
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  }
>  
> -static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
> +static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t id,
> +                                         evtchn_port_t *port)
>  {
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct list_head *entry;
> +    int rc;
>  
> -    if ( !s )
> -        return -ENOENT;
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        if ( s->id == id )
> +        {
> +            *port = s->buf_ioreq_evtchn;
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, int buf,
> +                                    xen_pfn_t *pfn)
> +{
> +    struct list_head *entry;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        if ( s->id == id )
> +        {
> +            int i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
> +
> +            *pfn = d->arch.hvm_domain.params[i] - s->id;
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                            int is_mmio, uint64_t start, uint64_t end)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct hvm_io_range *x;
> +    struct list_head *list;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    x = xmalloc(struct hvm_io_range);
> +    if ( x == NULL )
> +        return -ENOMEM;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto fail;
> +
> +found:
> +    INIT_RCU_HEAD(&x->rcu);
> +    x->start = start;
> +    x->end = end;
> +
> +    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
> +    list_add_rcu(&x->list_entry, list);
> +
> +    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
> +             d->domain_id,
> +             s->id,
> +             ( is_mmio ) ? "MMIO" : "PORTIO",
> +             x->start,
> +             x->end);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    *port = s->buf_ioreq_evtchn;
>      return 0;
> +
> +fail:
> +    xfree(x);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
>  }
>  
> -static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
> +static void free_io_range(struct rcu_head *rcu)
>  {
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    int i;
> +    struct hvm_io_range *x;
>  
> -    if ( !s )
> -        return -ENOENT;
> +    x = container_of (rcu, struct hvm_io_range, rcu);
> +
> +    xfree(x);
> +}
> +
> +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                                int is_mmio, uint64_t start)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct list_head *list, *entry;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto done;
> +
> +found:
> +    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
> +
> +    list_for_each ( entry,
> +                    list )
> +    {
> +        struct hvm_io_range *x = list_entry(entry,
> +                                            struct hvm_io_range,
> +                                            list_entry);
> +
> +        if ( start == x->start )
> +        {
> +            gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
> +                     d->domain_id,
> +                     s->id,
> +                     ( is_mmio ) ? "MMIO" : "PORTIO",
> +                     x->start,
> +                     x->end);
> +
> +            list_del_rcu(&x->list_entry);
> +            call_rcu(&x->rcu, free_io_range);
>  
> -    i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
> -    *pfn = d->arch.hvm_domain.params[i];
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                          uint16_t bdf)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct hvm_pcidev *x;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    x = xmalloc(struct hvm_pcidev);
> +    if ( x == NULL )
> +        return -ENOMEM;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto fail;
> +
> +found:
> +    INIT_RCU_HEAD(&x->rcu);
> +    x->bdf = bdf;
> +
> +    list_add_rcu(&x->list_entry, &s->pcidev_list);
> +
> +    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
> +             d->domain_id,
> +             s->id,
> +             x->bdf);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
>      return 0;
> +
> +fail:
> +    xfree(x);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static void free_pcidev(struct rcu_head *rcu)
> +{
> +    struct hvm_pcidev *x;
> +
> +    x = container_of (rcu, struct hvm_pcidev, rcu);
> +
> +    xfree(x);
> +}
> +
> +static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                              uint16_t bdf)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct list_head *entry;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto done;
> +
> +found:
> +    list_for_each ( entry,
> +                    &s->pcidev_list )
> +    {
> +        struct hvm_pcidev *x = list_entry(entry,
> +                                          struct hvm_pcidev,
> +                                          list_entry);
> +
> +        if ( bdf == x->bdf )
> +        {
> +            gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
> +                     d->domain_id,
> +                     s->id,
> +                     x->bdf);
> +
> +            list_del_rcu(&x->list_entry);
> +            call_rcu(&x->rcu, free_pcidev);
> +
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
> +{
> +    struct list_head *entry;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
> +            goto fail;
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return 0;
> +
> +fail:
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
> +{
> +    struct list_head *entry;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +}
> +
> +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> +{
> +    ioservid_t id;
> +
> +    for ( id = 0;
> +          id < d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> +          id++ )
> +        hvm_destroy_ioreq_server(d, id);
>  }
>  
>  static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
> @@ -750,18 +1204,31 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
>      return 0;
>  }
>  
> -static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
> +static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id, domid_t domid)
>  {
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
>      struct vcpu *v;
>      int rc = 0;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
>      domain_pause(d);
>  
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
>      rc = -ENOENT;
> -    if ( !s )
> -        goto done;
> +    goto done;
>  
> +found:
>      rc = 0;
>      if ( s->domid == domid )
>          goto done;
> @@ -787,6 +1254,8 @@ static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
>  done:
>      domain_unpause(d);
>  
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
>      return rc;
>  }
>  
> @@ -817,6 +1286,9 @@ int hvm_domain_initialise(struct domain *d)
>  
>      }
>  
> +    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
> +    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
> +    spin_lock_init(&d->arch.hvm_domain.pci_lock);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);
>  
> @@ -858,6 +1330,7 @@ int hvm_domain_initialise(struct domain *d)
>      rtc_init(d);
>  
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
> +    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>  
>      rc = hvm_funcs.domain_initialise(d);
>      if ( rc != 0 )
> @@ -888,7 +1361,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
>      if ( hvm_funcs.nhvm_domain_relinquish_resources )
>          hvm_funcs.nhvm_domain_relinquish_resources(d);
>  
> -    hvm_destroy_ioreq_server(d);
> +    hvm_destroy_all_ioreq_servers(d);
>  
>      msixtbl_pt_cleanup(d);
>  
> @@ -1520,7 +1993,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  {
>      int rc;
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
>  
>      hvm_asid_flush_vcpu(v);
>  
> @@ -1563,12 +2035,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
>           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
>          goto fail5;
>  
> -    if ( s )
> -    {
> -        rc = hvm_ioreq_server_add_vcpu(s, v);
> -        if ( rc < 0 )
> -            goto fail6;
> -    }
> +    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
> +    if ( rc < 0 )
> +        goto fail6;
>  
>      if ( v->vcpu_id == 0 )
>      {
> @@ -1604,10 +2073,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  void hvm_vcpu_destroy(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
>  
> -    if ( s )
> -        hvm_ioreq_server_remove_vcpu(s, v);
> +    hvm_all_ioreq_servers_remove_vcpu(d, v);
>  
>      nestedhvm_vcpu_destroy(v);
>  
> @@ -1646,11 +2113,112 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> +static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
> +
> +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v, ioreq_t *p)
> +{
> +#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
> +
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +
> +    if ( p->type == IOREQ_TYPE_PIO &&
> +         (p->addr & ~3) == 0xcfc )
> +    { 
> +        /* PCI config data cycle */
> +        type = IOREQ_TYPE_PCI_CONFIG;
> +
> +        spin_lock(&d->arch.hvm_domain.pci_lock);
> +        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
> +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> +    }
> +    else
> +    {
> +        type = p->type;
> +        addr = p->addr;
> +    }
> +
> +    rcu_read_lock(&ioreq_server_rcu_lock);
> +
> +    switch ( type )
> +    {
> +    case IOREQ_TYPE_COPY:
> +    case IOREQ_TYPE_PIO:
> +    case IOREQ_TYPE_PCI_CONFIG:
> +        break;
> +    default:
> +        goto done;
> +    }
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        switch ( type )
> +        {
> +            case IOREQ_TYPE_COPY:
> +            case IOREQ_TYPE_PIO: {
> +                struct list_head *list;
> +                struct hvm_io_range *x;
> +
> +                list = ( type == IOREQ_TYPE_COPY ) ?
> +                    &s->mmio_range_list :
> +                    &s->portio_range_list;
> +
> +                list_for_each_entry ( x,
> +                                      list,
> +                                      list_entry )
> +                {
> +                    if ( (addr >= x->start) && (addr <= x->end) )
> +                        goto found;
> +                }
> +                break;
> +            }
> +            case IOREQ_TYPE_PCI_CONFIG: {
> +                struct hvm_pcidev *x;
> +
> +                list_for_each_entry ( x,
> +                                      &s->pcidev_list,
> +                                      list_entry )
> +                {
> +                    if ( BDF(addr) == x->bdf ) {
> +                        p->type = type;
> +                        p->addr = addr;
> +                        goto found;
> +                    }
> +                }
> +                break;
> +            }
> +        }
> +    }
> +
> +done:
> +    /* The catch-all server has id 0 */
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == 0 )
> +            goto found;
> +    }
> +
> +    s = NULL;
> +
> +found:
> +    rcu_read_unlock(&ioreq_server_rcu_lock);
> +
> +    return s;
> +
> +#undef BDF
> +}
> +
>  int hvm_buffered_io_send(ioreq_t *p)
>  {
>      struct vcpu *v = current;
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
>      struct hvm_ioreq_page *iorp;
>      buffered_iopage_t *pg;
>      buf_ioreq_t bp;
> @@ -1660,6 +2228,7 @@ int hvm_buffered_io_send(ioreq_t *p)
>      /* Ensure buffered_iopage fits in a page */
>      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
>  
> +    s = hvm_select_ioreq_server(v, p);
>      if ( !s )
>          return 0;
>  
> @@ -1770,18 +2339,34 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
>  
>  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
>  {
> -    struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
>  
>      if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
>          return 0;
>  
> +    s = hvm_select_ioreq_server(v, p);
>      if ( !s )
>          return 0;
>  
>      return hvm_send_assist_req_to_server(s, v, p);
>  }
>  
> +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
> +{
> +    struct domain *d = v->domain;
> +    struct list_head *entry;
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +
> +        (void) hvm_send_assist_req_to_server(s, v, p);
> +    }
> +}
> +
>  void hvm_hlt(unsigned long rflags)
>  {
>      struct vcpu *curr = current;
> @@ -4370,6 +4955,215 @@ static int hvmop_flush_tlb_all(void)
>      return 0;
>  }
>  
> +static int hvmop_create_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> +{
> +    struct domain *curr_d = current->domain;
> +    xen_hvm_create_ioreq_server_t op;
> +    struct domain *d;
> +    ioservid_t id;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = -ENOSPC;
> +    for ( id = 1;
> +          id <  d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> +          id++ )
> +    {
> +        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
> +        if ( rc == -EEXIST )
> +            continue;
> +
> +        break;
> +    }
> +
> +    if ( rc == -EEXIST )
> +        rc = -ENOSPC;
> +
> +    if ( rc < 0 )
> +        goto out;
> +
> +    op.id = id;
> +
> +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_get_ioreq_server_info(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
> +{
> +    xen_hvm_get_ioreq_server_info_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
> +        goto out;
> +
> +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_map_io_range_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_io_range_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
> +                                          op.start, op.end);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_unmap_io_range_from_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
> +{
> +    xen_hvm_unmap_io_range_from_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
> +                                              op.start);
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_map_pcidev_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_pcidev_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_unmap_pcidev_from_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
> +{
> +    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_destroy_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
> +{
> +    xen_hvm_destroy_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    hvm_destroy_ioreq_server(d, op.id);
> +    rc = 0;
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>  {
> @@ -4378,6 +5172,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( op )
>      {
> +    case HVMOP_create_ioreq_server:
> +        rc = hvmop_create_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_get_ioreq_server_info:
> +        rc = hvmop_get_ioreq_server_info(
> +            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
> +        break;
> +    
> +    case HVMOP_map_io_range_to_ioreq_server:
> +        rc = hvmop_map_io_range_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_unmap_io_range_from_ioreq_server:
> +        rc = hvmop_unmap_io_range_from_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_map_pcidev_to_ioreq_server:
> +        rc = hvmop_map_pcidev_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_unmap_pcidev_from_ioreq_server:
> +        rc = hvmop_unmap_pcidev_from_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_destroy_ioreq_server:
> +        rc = hvmop_destroy_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> +        break;
> +    
>      case HVMOP_set_param:
>      case HVMOP_get_param:
>      {
> @@ -4466,9 +5295,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value == DOMID_SELF )
>                      a.value = curr_d->domain_id;
>  
> -                rc = hvm_create_ioreq_server(d, a.value);
> +                rc = hvm_create_ioreq_server(d, 0, a.value);
>                  if ( rc == -EEXIST )
> -                    rc = hvm_set_ioreq_server_domid(d, a.value);
> +                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
>                  break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  /* Not reflexive, as we must domain_pause(). */
> @@ -4533,6 +5362,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value > SHUTDOWN_MAX )
>                      rc = -EINVAL;
>                  break;
> +            case HVM_PARAM_NR_IOREQ_SERVERS:
> +                if ( d == current->domain )
> +                    rc = -EPERM;
> +                break;
>              }
>  
>              if ( rc == 0 ) 
> @@ -4567,7 +5400,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              case HVM_PARAM_BUFIOREQ_PFN:
>              case HVM_PARAM_BUFIOREQ_EVTCHN:
>                  /* May need to create server */
> -                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> +                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
>                  if ( rc != 0 && rc != -EEXIST )
>                      goto param_fail;
>  
> @@ -4576,7 +5409,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_IOREQ_PFN: {
>                      xen_pfn_t pfn;
>  
> -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
>                          goto param_fail;
>  
>                      a.value = pfn;
> @@ -4585,7 +5418,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_BUFIOREQ_PFN: {
>                      xen_pfn_t pfn;
>  
> -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
>                          goto param_fail;
>  
>                      a.value = pfn;
> @@ -4594,7 +5427,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_BUFIOREQ_EVTCHN: {
>                      evtchn_port_t port;
>  
> -                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
>                          goto param_fail;
>  
>                      a.value = port;
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index c9adb94..ac0d867 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -75,7 +75,7 @@ void send_invalidate_req(void)
>          .data = ~0UL, /* flush all */
>      };
>  
> -    (void)hvm_send_assist_req(v, &p);
> +    hvm_broadcast_assist_req(v, &p);
>  }
>  
>  int handle_mmio(void)
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index a77b83d..e9da543 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -41,17 +41,38 @@ struct hvm_ioreq_page {
>      void *va;
>  };
>  
> +struct hvm_io_range {
> +    struct list_head    list_entry;
> +    uint64_t            start, end;
> +    struct rcu_head     rcu;
> +};	
> +
> +struct hvm_pcidev {
> +    struct list_head    list_entry;
> +    uint16_t            bdf;
> +    struct rcu_head     rcu;
> +};	
> +
>  struct hvm_ioreq_server {
> +    struct list_head       list_entry;
> +    ioservid_t             id;
>      struct domain          *domain;
>      domid_t                domid;
>      struct hvm_ioreq_page  ioreq;
>      int                    ioreq_evtchn[MAX_HVM_VCPUS];
>      struct hvm_ioreq_page  buf_ioreq;
>      int                    buf_ioreq_evtchn;
> +    struct list_head       mmio_range_list;
> +    struct list_head       portio_range_list;
> +    struct list_head       pcidev_list;
>  };
>  
>  struct hvm_domain {
> -    struct hvm_ioreq_server *ioreq_server;
> +    struct list_head        ioreq_server_list;
> +    spinlock_t              ioreq_server_lock;
> +    uint32_t                pci_cf8;
> +    spinlock_t              pci_lock;
> +
>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 40aeddf..4118669 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -229,6 +229,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
>  void destroy_ring_for_helper(void **_va, struct page_info *page);
>  
>  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
>  
>  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
>  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index a9aab4b..6b31189 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -23,6 +23,7 @@
>  
>  #include "../xen.h"
>  #include "../trace.h"
> +#include "../event_channel.h"
>  
>  /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
>  #define HVMOP_set_param           0
> @@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
>  typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
>  
> +typedef uint32_t ioservid_t;
> +
> +DEFINE_XEN_GUEST_HANDLE(ioservid_t);
> +
> +#define HVMOP_create_ioreq_server 17
> +struct xen_hvm_create_ioreq_server {
> +    domid_t domid;  /* IN - domain to be serviced */
> +    ioservid_t id;  /* OUT - server id */
> +};
> +typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
> +
> +#define HVMOP_get_ioreq_server_info 18
> +struct xen_hvm_get_ioreq_server_info {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - server id */
> +    xen_pfn_t pfn;          /* OUT - ioreq pfn */
> +    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
> +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> +};
> +typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> +
> +#define HVMOP_map_io_range_to_ioreq_server 19
> +struct xen_hvm_map_io_range_to_ioreq_server {
> +    domid_t domid;                  /* IN - domain to be serviced */
> +    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
> +    int is_mmio;                    /* IN - MMIO or port IO? */
> +    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> +};
> +typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_io_range_from_ioreq_server 20
> +struct xen_hvm_unmap_io_range_from_ioreq_server {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
> +    uint8_t is_mmio;        /* IN - MMIO or port IO? */
> +    uint64_aligned_t start; /* IN - start address of the range to remove */
> +};
> +typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
> +
> +#define HVMOP_map_pcidev_to_ioreq_server 21
> +struct xen_hvm_map_pcidev_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> +};
> +typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_pcidev_from_ioreq_server 22
> +struct xen_hvm_unmap_pcidev_from_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> +};
> +typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
> +
> +#define HVMOP_destroy_ioreq_server 23
> +struct xen_hvm_destroy_ioreq_server {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - server id */
> +};
> +typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
> index f05d130..e84fa75 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -34,6 +34,7 @@
>  
>  #define IOREQ_TYPE_PIO          0 /* pio */
>  #define IOREQ_TYPE_COPY         1 /* mmio ops */
> +#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
>  #define IOREQ_TYPE_TIMEOFFSET   7
>  #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
>  
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 517a184..4109b11 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -145,6 +145,8 @@
>  /* SHUTDOWN_* action in case of a triple fault */
>  #define HVM_PARAM_TRIPLE_FAULT_REASON 31
>  
> -#define HVM_NR_PARAMS          32
> +#define HVM_PARAM_NR_IOREQ_SERVERS 32
> +
> +#define HVM_NR_PARAMS          33
>  
>  #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures
  2014-03-04 11:40 ` [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-03-04 12:21   ` Jan Beulich
  2014-03-04 17:25     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2014-03-04 12:21 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
> @@ -232,20 +211,15 @@ static int hvmemul_do_io(
>              vio->io_state = HVMIO_handle_mmio_awaiting_completion;
>          break;
>      case X86EMUL_UNHANDLEABLE:
> -        /* If there is no backing DM, just ignore accesses */
> -        if ( !has_dm )
> +        rc = X86EMUL_RETRY;
> +        if ( !hvm_send_assist_req(curr, p) )
>          {
>              rc = X86EMUL_OKAY;
>              vio->io_state = HVMIO_none;
>          }
> -        else
> -        {
> -            rc = X86EMUL_RETRY;
> -            if ( !hvm_send_assist_req(curr) )
> -                vio->io_state = HVMIO_none;
> -            else if ( p_data == NULL )
> -                rc = X86EMUL_OKAY;
> -        }
> +        else if ( p_data == NULL )
> +            rc = X86EMUL_OKAY;
> +
>          break;

Is the rc value change here really intentional? Previously, when
!hvm_send_assist_req(), rc would end up being X86EMUL_RETRY,
while now it gets set to X86EMUL_OKAY. After all, there were
three different paths originally (setting rc to X86EMUL_OKAY
and/or setting vio->io_state to HVMIO_none, and you can't
express this with the bool_t return type of hvm_send_assist_req().

> +bool_t hvm_io_pending(struct vcpu *v)
> +{
> +    ioreq_t *p;
> +
> +    if ( !(p = get_ioreq(v)) )

I'd prefer if you used the call to get_ioreq() as initializer instead of
in a parenthesized assignment inside a conditional. But yes, it's a
matter of taste...

> @@ -1407,7 +1425,86 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> -bool_t hvm_send_assist_req(struct vcpu *v)
> +int hvm_buffered_io_send(ioreq_t *p)

const ioreq_t *?

> +{
> +    struct vcpu *v = current;

"curr" please.

> +    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
> +    buffered_iopage_t *pg = iorp->va;
> +    buf_ioreq_t bp;
> +    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
> +    int qw = 0;
> +
> +    /* Ensure buffered_iopage fits in a page */
> +    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> +
> +    /*
> +     * Return 0 for the cases we can't deal with:
> +     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> +     *  - we cannot buffer accesses to guest memory buffers, as the guest
> +     *    may expect the memory buffer to be synchronously accessed
> +     *  - the count field is usually used with data_is_ptr and since we don't
> +     *    support data_is_ptr we do not waste space for the count field either
> +     */
> +    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> +        return 0;
> +
> +    bp.type = p->type;
> +    bp.dir  = p->dir;
> +    switch ( p->size )
> +    {
> +    case 1:
> +        bp.size = 0;
> +        break;
> +    case 2:
> +        bp.size = 1;
> +        break;
> +    case 4:
> +        bp.size = 2;
> +        break;
> +    case 8:
> +        bp.size = 3;
> +        qw = 1;
> +        break;
> +    default:
> +        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
> +        return 0;
> +    }
> +    
> +    bp.data = p->data;
> +    bp.addr = p->addr;
> +    
> +    spin_lock(&iorp->lock);
> +
> +    if ( (pg->write_pointer - pg->read_pointer) >=
> +         (IOREQ_BUFFER_SLOT_NUM - qw) )
> +    {
> +        /* The queue is full: send the iopacket through the normal path. */
> +        spin_unlock(&iorp->lock);
> +        return 0;
> +    }
> +    
> +    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
> +           &bp, sizeof(bp));

Better be type safe using an assignment here?

> +    

Line of only spaces.

> +    if ( qw )
> +    {
> +        bp.data = p->data >> 32;
> +        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
> +               &bp, sizeof(bp));
> +    }
> +
> +    /* Make the ioreq_t visible /before/ write_pointer. */
> +    wmb();
> +    pg->write_pointer += qw ? 2 : 1;
> +
> +    notify_via_xen_event_channel(v->domain,
> +            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> +    spin_unlock(&iorp->lock);

Perhaps worth caching v->domain into a local variable?

> +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)

const ioreq_t *

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction.
  2014-03-04 11:40 ` [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-03-04 12:50   ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2014-03-04 12:50 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
>  bool_t hvm_io_pending(struct vcpu *v)
>  {
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;

Hardly worth having "d" as a separate variable here, considering its
only use if the one above. Same elsewhere in the patch.

> +static int hvm_init_ioreq_server(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s;
> +    int i;

"unsigned int" please.

> +static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
> +{
> +    struct domain *d = s->domain;
> +
> +    if ( s->ioreq.va != NULL )
> +    {
> +        shared_iopage_t *p = s->ioreq.va;
> +        struct vcpu *v;

Please be consistent - either all variables needed only inside the
if() body should get declared inside that body, or all of them at
the top of the function.

> +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
> +{
> +    if ( v->vcpu_id == 0 )
> +    {
> +        if ( s->buf_ioreq_evtchn >= 0 )

Please fold these if()s together.

> +static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
> +{
> +    struct domain *d = s->domain;
> +    struct vcpu *v;
> +    int rc = 0;
> +
> +    domain_pause(d);
> +
> +    if ( d->vcpu[0] )
> +    {
> +        rc = hvm_replace_event_channel(d->vcpu[0], domid, &s->buf_ioreq_evtchn);
> +        if ( rc < 0 )
> +            goto done;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        rc = hvm_replace_event_channel(v, domid, &s->ioreq_evtchn[v->vcpu_id]);
> +        if ( rc < 0 )
> +            goto done;
> +    }
> +
> +    hvm_update_ioreq_server_evtchn(s);
> +
> +    s->domid = domid;
> +
> +done:

In an earlier function you set rc to zero right before a similar final
label. Here you don't. Please be consistent again. And while I
personally would favor avoiding pointless assignments, I wouldn't
want the code to apparently depend on functions only ever
returning non-positive values. I.e. I'd prefer the respective error
checks here and elsewhere to say "if ( rc )" instead of "if ( rc < 0 )".

> @@ -4379,6 +4493,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          {
>              switch ( a.index )
>              {
> +            case HVM_PARAM_BUFIOREQ_EVTCHN: {
> +                struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +
> +                a.value = s->buf_ioreq_evtchn;
> +                break;
> +            }

Why? If d->arch.hvm_domain.params[a.index] can get out of sync,
perhaps it would be better to get it synchronized again.

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -41,10 +41,17 @@ struct hvm_ioreq_page {
>      void *va;
>  };
>  
> -struct hvm_domain {
> +struct hvm_ioreq_server {
> +    struct domain          *domain;
> +    domid_t                domid;

These two fields are too generic to be here without any comment:
It's unclear without looking at the rest of the patch/code which one
refers to the subject domain and which one represents the server.

>      struct hvm_ioreq_page  ioreq;
> +    int                    ioreq_evtchn[MAX_HVM_VCPUS];

Ugly. Would be much better if this was stored in each struct vcpu.
And considering that we likely won't stay at 128 vCPU-s as the
upper bound indefinitely (we already don't strictly need to with
x2APIC emulation support in place), this array could eventually
grow quite large, to the point of making the allocation exceed the
1 page boundary we (informally) have in place for all runtime
allocations.

And _if_ this needs to remain an array, it should be the last item
in the structure, as to not badly affect code size for accesses to
subsequent members.

And finally, while I realize current code is not consistent in that
regard, please try using evtchn_port_t for event channels at
least in new code additions.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-04 11:40 ` [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-03-04 13:02   ` Jan Beulich
  2014-03-04 13:30     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2014-03-04 13:02 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
> +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
>  {
> +    struct hvm_ioreq_page *iorp;
> +
> +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;

Pointless parentheses and spaces (also further down).

Also "buf" appears to be boolean, so please reflect this by using
bool_t.

> @@ -557,38 +557,6 @@ static int handle_pvh_io(
>      return X86EMUL_OKAY;
>  }
>  
> -static int hvm_init_ioreq_server(struct domain *d)
> -{
> -    struct hvm_ioreq_server *s;
> -    int i;
> -
> -    s = xzalloc(struct hvm_ioreq_server);
> -    if ( !s )
> -        return -ENOMEM;
> -
> -    s->domain = d;
> -
> -    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> -        s->ioreq_evtchn[i] = -1;
> -    s->buf_ioreq_evtchn = -1;
> -
> -    hvm_init_ioreq_page(d, &s->ioreq);
> -    hvm_init_ioreq_page(d, &s->buf_ioreq);
> -
> -    d->arch.hvm_domain.ioreq_server = s;
> -    return 0;
> -}
> -
> -static void hvm_deinit_ioreq_server(struct domain *d)
> -{
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -
> -    hvm_destroy_ioreq_page(d, &s->ioreq);
> -    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> -
> -    xfree(s);
> -}

Odd - you completely remove code here that an earlier patch of
this series added, and you put it back in altered form further down.
The original patch in this case would better add it in the final place
right away, so that the diff actually tells the reader what changes.

> +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +{
> +    struct hvm_ioreq_server *s;
> +    unsigned long pfn;
> +    struct vcpu *v;
> +    int i, rc;
> +
> +    rc = -EEXIST;
> +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> +        goto fail_exist;

Please don't use goto-s without actual need.

> +
> +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);

That's a debugging left-over I assume?

> +fail_add_vcpu:
> +    for_each_vcpu ( d, v )
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    domain_unpause(d);
> +    hvm_destroy_ioreq_page(s, 1);
> +fail_set_buf_ioreq:
> +    hvm_destroy_ioreq_page(s, 0);
> +fail_set_ioreq:
> +    xfree(s);
> +fail_alloc:
> +fail_exist:

Labels should be indented by one space.

> +static void hvm_destroy_ioreq_server(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct vcpu *v;
> +
> +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);

Again a leftover?

> @@ -697,27 +790,6 @@ done:
>      return rc;
>  }
>  
> -static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> -{
> -    struct domain *d = s->domain;
> -    int rc;
> -
> -    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> -    if ( rc < 0 )
> -        return rc;
> -
> -    hvm_update_ioreq_server_evtchn(s);
> -
> -    return 0;
> -}
> -
> -static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> -{
> -    struct domain *d = s->domain;
> -
> -    return hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
> -}

Again code an earlier patch added getting moved around?

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-04 13:02   ` Jan Beulich
@ 2014-03-04 13:30     ` Paul Durrant
  2014-03-04 15:43       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 13:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 04 March 2014 13:03
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v2 4/6] ioreq-server: on-demand creation of
> ioreq server
> 
> >>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
> > +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
> >  {
> > +    struct hvm_ioreq_page *iorp;
> > +
> > +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
> 
> Pointless parentheses and spaces (also further down).
> 

I general prefer to parenthesize the conditional as a nod to this essentially being an if-then-else. Is that outlawed?

> Also "buf" appears to be boolean, so please reflect this by using
> bool_t.
> 
> > @@ -557,38 +557,6 @@ static int handle_pvh_io(
> >      return X86EMUL_OKAY;
> >  }
> >
> > -static int hvm_init_ioreq_server(struct domain *d)
> > -{
> > -    struct hvm_ioreq_server *s;
> > -    int i;
> > -
> > -    s = xzalloc(struct hvm_ioreq_server);
> > -    if ( !s )
> > -        return -ENOMEM;
> > -
> > -    s->domain = d;
> > -
> > -    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> > -        s->ioreq_evtchn[i] = -1;
> > -    s->buf_ioreq_evtchn = -1;
> > -
> > -    hvm_init_ioreq_page(d, &s->ioreq);
> > -    hvm_init_ioreq_page(d, &s->buf_ioreq);
> > -
> > -    d->arch.hvm_domain.ioreq_server = s;
> > -    return 0;
> > -}
> > -
> > -static void hvm_deinit_ioreq_server(struct domain *d)
> > -{
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -
> > -    hvm_destroy_ioreq_page(d, &s->ioreq);
> > -    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> > -
> > -    xfree(s);
> > -}
> 
> Odd - you completely remove code here that an earlier patch of
> this series added, and you put it back in altered form further down.
> The original patch in this case would better add it in the final place
> right away, so that the diff actually tells the reader what changes.
> 

Well, it is slightly different code. I wanted a sequence where we moved to an ioreq server struct that was just an abstraction within the domain struct (and had the same lifecycle) to one that was created on-demand. This is why the code moves and changes name. It seems like a reasonable way to sequence things to me. I could, of course, do all this in a monolithic patch but it would probably be harder to review.

> > +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    unsigned long pfn;
> > +    struct vcpu *v;
> > +    int i, rc;
> > +
> > +    rc = -EEXIST;
> > +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +        goto fail_exist;
> 
> Please don't use goto-s without actual need.
> 
> > +
> > +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> 
> That's a debugging left-over I assume?
> 

Yes.

> > +fail_add_vcpu:
> > +    for_each_vcpu ( d, v )
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> > +    domain_unpause(d);
> > +    hvm_destroy_ioreq_page(s, 1);
> > +fail_set_buf_ioreq:
> > +    hvm_destroy_ioreq_page(s, 0);
> > +fail_set_ioreq:
> > +    xfree(s);
> > +fail_alloc:
> > +fail_exist:
> 
> Labels should be indented by one space.
> 

I couldn't find anything in CODING_STYLE concerning labels, but I'll follow suit.

> > +static void hvm_destroy_ioreq_server(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct vcpu *v;
> > +
> > +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> 
> Again a leftover?
> 

Yep - should be XENLOG_DEBUG.

> > @@ -697,27 +790,6 @@ done:
> >      return rc;
> >  }
> >
> > -static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > -{
> > -    struct domain *d = s->domain;
> > -    int rc;
> > -
> > -    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> > -    if ( rc < 0 )
> > -        return rc;
> > -
> > -    hvm_update_ioreq_server_evtchn(s);
> > -
> > -    return 0;
> > -}
> > -
> > -static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > -{
> > -    struct domain *d = s->domain;
> > -
> > -    return hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
> > -}
> 
> Again code an earlier patch added getting moved around?
> 

For similar reasons.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server
  2014-03-04 13:30     ` Paul Durrant
@ 2014-03-04 15:43       ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2014-03-04 15:43 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel@lists.xen.org

>>> On 04.03.14 at 14:30, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 04 March 2014 13:03
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org 
>> Subject: Re: [Xen-devel] [PATCH v2 4/6] ioreq-server: on-demand creation of
>> ioreq server
>> 
>> >>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
>> > +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
>> >  {
>> > +    struct hvm_ioreq_page *iorp;
>> > +
>> > +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
>> 
>> Pointless parentheses and spaces (also further down).
>> 
> 
> I general prefer to parenthesize the conditional as a nod to this 
> essentially being an if-then-else. Is that outlawed?

It's contrary to how all other code does it - optionally use
parentheses when the condition is an more involved expression,
but don't use them when it's a simple identifier or a simple
(generally taken to mean unary) expression.

>> Odd - you completely remove code here that an earlier patch of
>> this series added, and you put it back in altered form further down.
>> The original patch in this case would better add it in the final place
>> right away, so that the diff actually tells the reader what changes.
>> 
> 
> Well, it is slightly different code. I wanted a sequence where we moved to 
> an ioreq server struct that was just an abstraction within the domain struct 
> (and had the same lifecycle) to one that was created on-demand. This is why 
> the code moves and changes name. It seems like a reasonable way to sequence 
> things to me. I could, of course, do all this in a monolithic patch but it 
> would probably be harder to review.

Indeed it would be. What I'm asking for is that you put the code
in the right place from the beginning, so that the patch then shows
the actual change you make to it.

>> > +fail_add_vcpu:
>> > +    for_each_vcpu ( d, v )
>> > +        hvm_ioreq_server_remove_vcpu(s, v);
>> > +    domain_unpause(d);
>> > +    hvm_destroy_ioreq_page(s, 1);
>> > +fail_set_buf_ioreq:
>> > +    hvm_destroy_ioreq_page(s, 0);
>> > +fail_set_ioreq:
>> > +    xfree(s);
>> > +fail_alloc:
>> > +fail_exist:
>> 
>> Labels should be indented by one space.
>> 
> 
> I couldn't find anything in CODING_STYLE concerning labels, but I'll follow 
> suit.

The main reason for this is to not have the labels show up as -p
diff/patch context - we want the function name here.

>> > +static void hvm_destroy_ioreq_server(struct domain *d)
>> > +{
>> > +    struct hvm_ioreq_server *s;
>> > +    struct vcpu *v;
>> > +
>> > +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
>> 
>> Again a leftover?
>> 
> 
> Yep - should be XENLOG_DEBUG.

Not even that imo.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures
  2014-03-04 12:21   ` Jan Beulich
@ 2014-03-04 17:25     ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-04 17:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 04 March 2014 12:21
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v2 1/6] ioreq-server: centralize access to
> ioreq structures
> 
> >>> On 04.03.14 at 12:40, Paul Durrant <paul.durrant@citrix.com> wrote:
> > @@ -232,20 +211,15 @@ static int hvmemul_do_io(
> >              vio->io_state = HVMIO_handle_mmio_awaiting_completion;
> >          break;
> >      case X86EMUL_UNHANDLEABLE:
> > -        /* If there is no backing DM, just ignore accesses */
> > -        if ( !has_dm )
> > +        rc = X86EMUL_RETRY;
> > +        if ( !hvm_send_assist_req(curr, p) )
> >          {
> >              rc = X86EMUL_OKAY;
> >              vio->io_state = HVMIO_none;
> >          }
> > -        else
> > -        {
> > -            rc = X86EMUL_RETRY;
> > -            if ( !hvm_send_assist_req(curr) )
> > -                vio->io_state = HVMIO_none;
> > -            else if ( p_data == NULL )
> > -                rc = X86EMUL_OKAY;
> > -        }
> > +        else if ( p_data == NULL )
> > +            rc = X86EMUL_OKAY;
> > +
> >          break;
> 
> Is the rc value change here really intentional? Previously, when
> !hvm_send_assist_req(), rc would end up being X86EMUL_RETRY,
> while now it gets set to X86EMUL_OKAY. After all, there were
> three different paths originally (setting rc to X86EMUL_OKAY
> and/or setting vio->io_state to HVMIO_none, and you can't
> express this with the bool_t return type of hvm_send_assist_req().
> 

It was intentional, but I'm no longer sure it's the right thing to do. I'll re-work things to stick more closely to the original code by introducing a a new hvm_has_dm() function to replace the stack boolean.

> > +bool_t hvm_io_pending(struct vcpu *v)
> > +{
> > +    ioreq_t *p;
> > +
> > +    if ( !(p = get_ioreq(v)) )
> 
> I'd prefer if you used the call to get_ioreq() as initializer instead of
> in a parenthesized assignment inside a conditional. But yes, it's a
> matter of taste...
> 

Ok. Fair enough.

> > @@ -1407,7 +1425,86 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v)
> > +int hvm_buffered_io_send(ioreq_t *p)
> 
> const ioreq_t *?
> 

Yes.

> > +{
> > +    struct vcpu *v = current;
> 
> "curr" please.
> 

Ok.

> > +    struct hvm_ioreq_page *iorp = &v->domain-
> >arch.hvm_domain.buf_ioreq;
> > +    buffered_iopage_t *pg = iorp->va;
> > +    buf_ioreq_t bp;
> > +    /* Timeoffset sends 64b data, but no address. Use two consecutive
> slots. */
> > +    int qw = 0;
> > +
> > +    /* Ensure buffered_iopage fits in a page */
> > +    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> > +
> > +    /*
> > +     * Return 0 for the cases we can't deal with:
> > +     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> > +     *  - we cannot buffer accesses to guest memory buffers, as the guest
> > +     *    may expect the memory buffer to be synchronously accessed
> > +     *  - the count field is usually used with data_is_ptr and since we don't
> > +     *    support data_is_ptr we do not waste space for the count field
> either
> > +     */
> > +    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> > +        return 0;
> > +
> > +    bp.type = p->type;
> > +    bp.dir  = p->dir;
> > +    switch ( p->size )
> > +    {
> > +    case 1:
> > +        bp.size = 0;
> > +        break;
> > +    case 2:
> > +        bp.size = 1;
> > +        break;
> > +    case 4:
> > +        bp.size = 2;
> > +        break;
> > +    case 8:
> > +        bp.size = 3;
> > +        qw = 1;
> > +        break;
> > +    default:
> > +        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p-
> >size);
> > +        return 0;
> > +    }
> > +
> > +    bp.data = p->data;
> > +    bp.addr = p->addr;
> > +
> > +    spin_lock(&iorp->lock);
> > +
> > +    if ( (pg->write_pointer - pg->read_pointer) >=
> > +         (IOREQ_BUFFER_SLOT_NUM - qw) )
> > +    {
> > +        /* The queue is full: send the iopacket through the normal path. */
> > +        spin_unlock(&iorp->lock);
> > +        return 0;
> > +    }
> > +
> > +    memcpy(&pg->buf_ioreq[pg->write_pointer %
> IOREQ_BUFFER_SLOT_NUM],
> > +           &bp, sizeof(bp));
> 
> Better be type safe using an assignment here?
> 

I didn't write this code, but I'll fix it up since I'm moving it.

> > +
> 
> Line of only spaces.
> 

Again, I'll fix this.

> > +    if ( qw )
> > +    {
> > +        bp.data = p->data >> 32;
> > +        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) %
> IOREQ_BUFFER_SLOT_NUM],
> > +               &bp, sizeof(bp));
> > +    }
> > +
> > +    /* Make the ioreq_t visible /before/ write_pointer. */
> > +    wmb();
> > +    pg->write_pointer += qw ? 2 : 1;
> > +
> > +    notify_via_xen_event_channel(v->domain,
> > +            v->domain-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> > +    spin_unlock(&iorp->lock);
> 
> Perhaps worth caching v->domain into a local variable?
> 

Ok.

> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
> 
> const ioreq_t *
> 

Yep.

Thanks,

  Paul

> Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-04 12:06   ` Andrew Cooper
@ 2014-03-05 14:44     ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-05 14:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 04 March 2014 12:06
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v2 5/6] ioreq-server: add support for
> multiple servers
> 
> On 04/03/14 11:40, Paul Durrant wrote:
> > The legacy 'catch-all' server is always created with id 0. Secondary
> > servers will have an id ranging from 1 to a limit set by the toolstack
> > via the 'max_emulators' build info field. This defaults to 1 so ordinarily
> > no extra special pages are reserved for secondary emulators. It may be
> > increased using the secondary_device_emulators parameter in xl.cfg(5).
> > There's no clear limit to apply to the number of emulators so I've not
> > applied one.
> >
> > Because of the re-arrangement of the special pages in a previous patch we
> > only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS
> to determine
> > the layout of the shared pages for multiple emulators. Guests migrated in
> > from hosts without this patch will be lacking the save record which stores
> > the new parameter and so the guest is assumed to only have had a single
> > emulator.
> >
> > Added some more emacs boilerplate to xenctrl.h and xenguest.h
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> How does the build param interact with the hvmparam?  It appears not to.
> 

The build param is the value used to set the HVM param on domain create. (See setup_guest() just below where all the special PFNs are set).

> On migrate, the receiving side will have to know, out-of-band, what to
> set max_emulators to when building the domain.  The setparam code needs
> to validate the hvmparam against the build param and return
> -EINVAL/-E2BIG in the case that the hvmparam is too large.
> xc_domain_restore() needs to detect this and abort the migration if the
> guest can't be restored with the expected number of emulators.
> 

I don't think we need any of this. The HVM param is in the save record and so will be preserved. Migration in from a prior version of Xen will initialize the value to 1, which is correct for that case. The only need for max_emulators is to reserve the correct number of special pages when a new domain is built, so this is irrelevant for a domain restore; the only important thing is that the maximum number of emulators should never exceed the value of max_emulators when the domain was originally built, and the preservation of the HVM param ensures this.

  Paul

> ~Andrew
> 
> > ---
> >  docs/man/xl.cfg.pod.5            |    7 +
> >  tools/libxc/xc_domain.c          |  175 +++++++
> >  tools/libxc/xc_domain_restore.c  |   20 +
> >  tools/libxc/xc_domain_save.c     |   12 +
> >  tools/libxc/xc_hvm_build_x86.c   |   24 +-
> >  tools/libxc/xenctrl.h            |   51 ++
> >  tools/libxc/xenguest.h           |   12 +
> >  tools/libxc/xg_save_restore.h    |    1 +
> >  tools/libxl/libxl.h              |    8 +
> >  tools/libxl/libxl_create.c       |    3 +
> >  tools/libxl/libxl_dom.c          |    1 +
> >  tools/libxl/libxl_types.idl      |    1 +
> >  tools/libxl/xl_cmdimpl.c         |    3 +
> >  xen/arch/x86/hvm/hvm.c           |  951
> +++++++++++++++++++++++++++++++++++---
> >  xen/arch/x86/hvm/io.c            |    2 +-
> >  xen/include/asm-x86/hvm/domain.h |   23 +-
> >  xen/include/asm-x86/hvm/hvm.h    |    1 +
> >  xen/include/public/hvm/hvm_op.h  |   70 +++
> >  xen/include/public/hvm/ioreq.h   |    1 +
> >  xen/include/public/hvm/params.h  |    4 +-
> >  20 files changed, 1300 insertions(+), 70 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> > index e15a49f..0226c55 100644
> > --- a/docs/man/xl.cfg.pod.5
> > +++ b/docs/man/xl.cfg.pod.5
> > @@ -1281,6 +1281,13 @@ specified, enabling the use of XenServer PV
> drivers in the guest.
> >  This parameter only takes effect when device_model_version=qemu-xen.
> >  See F<docs/misc/pci-device-reservations.txt> for more information.
> >
> > +=item B<secondary_device_emulators=NUMBER>
> > +
> > +If a number of secondary device emulators (i.e. in addition to
> > +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> > +guest then this parameter can be set with the count of how many are
> > +to be used. The default value is zero.
> > +
> >  =back
> >
> >  =head2 Device-Model Options
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..dfa905b 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle,
> domid_t dom, int param, unsigned long
> >      return rc;
> >  }
> >
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +                               domid_t domid,
> > +                               ioservid_t *id)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    *id = arg->id;
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > +                                 domid_t domid,
> > +                                 ioservid_t id,
> > +                                 xen_pfn_t *pfn,
> > +                                 xen_pfn_t *buf_pfn,
> > +                                 evtchn_port_t *buf_port)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    if ( rc != 0 )
> > +        goto done;
> > +
> > +    if ( pfn )
> > +        *pfn = arg->pfn;
> > +
> > +    if ( buf_pfn )
> > +        *buf_pfn = arg->buf_pfn;
> > +
> > +    if ( buf_port )
> > +        *buf_port = arg->buf_port;
> > +
> > +done:
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                        ioservid_t id, int is_mmio,
> > +                                        uint64_t start, uint64_t end)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->is_mmio = is_mmio;
> > +    arg->start = start;
> > +    arg->end = end;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> > +                                            ioservid_t id, int is_mmio,
> > +                                            uint64_t start)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_ser
> ver_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->is_mmio = is_mmio;
> > +    arg->start = start;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                      ioservid_t id, uint16_t bdf)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->bdf = bdf;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> > +                                          ioservid_t id, uint16_t bdf)
> > +{
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_serve
> r_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->bdf = bdf;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> > +                                domid_t domid,
> > +                                ioservid_t id)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> >  int xc_domain_setdebugging(xc_interface *xch,
> >                             uint32_t domid,
> >                             unsigned int enable)
> > diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> > index 1f6ce50..3116653 100644
> > --- a/tools/libxc/xc_domain_restore.c
> > +++ b/tools/libxc/xc_domain_restore.c
> > @@ -746,6 +746,7 @@ typedef struct {
> >      uint64_t acpi_ioport_location;
> >      uint64_t viridian;
> >      uint64_t vm_generationid_addr;
> > +    uint64_t nr_ioreq_servers;
> >
> >      struct toolstack_data_t tdata;
> >  } pagebuf_t;
> > @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch,
> struct restore_ctx *ctx,
> >          DPRINTF("read generation id buffer address");
> >          return pagebuf_get_one(xch, ctx, buf, fd, dom);
> >
> > +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> > +        /* Skip padding 4 bytes then read the acpi ioport location. */
> > +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> > +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> > +        {
> > +            PERROR("error reading the number of IOREQ servers");
> > +            return -1;
> > +        }
> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> > +
> >      default:
> >          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
> >              ERROR("Max batch size exceeded (%d). Giving up.", count);
> > @@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> >      if (pagebuf.viridian != 0)
> >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> >
> > +    if ( hvm ) {
> > +        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
> > +
> > +        if ( nr_ioreq_servers == 0 )
> > +            nr_ioreq_servers = 1;
> > +
> > +        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> nr_ioreq_servers);
> > +    }
> > +
> >      if (pagebuf.acpi_ioport_location == 1) {
> >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> >          xc_set_hvm_param(xch, dom,
> HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> > index 42c4752..3293e29 100644
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> > @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd,
> uint32_t dom, uint32_t max_iter
> >              PERROR("Error when writing the viridian flag");
> >              goto out;
> >          }
> > +
> > +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> > +        chunk.data = 0;
> > +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > +                         (unsigned long *)&chunk.data);
> > +
> > +        if ( (chunk.data != 0) &&
> > +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> > +        {
> > +            PERROR("Error when writing the number of IOREQ servers");
> > +            goto out;
> > +        }
> >      }
> >
> >      if ( callbacks != NULL && callbacks->toolstack_save != NULL )
> > diff --git a/tools/libxc/xc_hvm_build_x86.c
> b/tools/libxc/xc_hvm_build_x86.c
> > index b65e702..6d6328a 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -45,7 +45,7 @@
> >  #define SPECIALPAGE_IDENT_PT 4
> >  #define SPECIALPAGE_CONSOLE  5
> >  #define SPECIALPAGE_IOREQ    6
> > -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
> needs 2 pages */
> > +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq
> server needs 2 pages */
> >  #define special_pfn(x) (0xff000u - 1 - (x))
> >
> >  #define VGA_HOLE_SIZE (0x20)
> > @@ -85,7 +85,8 @@ static int modules_init(struct xc_hvm_build_args
> *args,
> >  }
> >
> >  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> > -                           uint64_t mmio_start, uint64_t mmio_size)
> > +                           uint64_t mmio_start, uint64_t mmio_size,
> > +                           int max_emulators)
> >  {
> >      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
> >          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> > @@ -113,7 +114,7 @@ static void build_hvm_info(void *hvm_info_page,
> uint64_t mem_size,
> >      /* Memory parameters. */
> >      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
> >      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> > -    hvm_info->reserved_mem_pgstart = special_pfn(0) -
> NR_SPECIAL_PAGES;
> > +    hvm_info->reserved_mem_pgstart = special_pfn(0) -
> NR_SPECIAL_PAGES(max_emulators);
> >
> >      /* Finish with the checksum. */
> >      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> > @@ -256,6 +257,10 @@ static int setup_guest(xc_interface *xch,
> >          stat_1gb_pages = 0;
> >      int pod_mode = 0;
> >      int claim_enabled = args->claim_enabled;
> > +    int max_emulators = args->max_emulators;
> > +
> > +    if ( max_emulators < 1 )
> > +        goto error_out;
> >
> >      if ( nr_pages > target_pages )
> >          pod_mode = XENMEMF_populate_on_demand;
> > @@ -468,12 +473,13 @@ static int setup_guest(xc_interface *xch,
> >                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
> >                HVM_INFO_PFN)) == NULL )
> >          goto error_out;
> > -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> > +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
> > +                   max_emulators);
> >      munmap(hvm_info_page, PAGE_SIZE);
> >
> >      /* Allocate and clear special pages. */
> >
> > -    DPRINTF("%d SPECIAL PAGES:\n", NR_SPECIAL_PAGES);
> > +    DPRINTF("%d SPECIAL PAGES:\n",
> NR_SPECIAL_PAGES(max_emulators));
> >      DPRINTF("  PAGING:    %"PRI_xen_pfn"\n",
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING));
> >      DPRINTF("  ACCESS:    %"PRI_xen_pfn"\n",
> > @@ -486,10 +492,10 @@ static int setup_guest(xc_interface *xch,
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT));
> >      DPRINTF("  CONSOLE:   %"PRI_xen_pfn"\n",
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE));
> > -    DPRINTF("  IOREQ:     %"PRI_xen_pfn"\n",
> > +    DPRINTF("  IOREQ(%02d): %"PRI_xen_pfn"\n", max_emulators * 2,
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> >
> > -    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> > +    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
> >      {
> >          xen_pfn_t pfn = special_pfn(i);
> >          rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> > @@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> >                       special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> > +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > +                     max_emulators);
> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index 13f816b..84cab13 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
> >  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long value);
> >  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long *value);
> >
> > +/*
> > + * IOREQ server API
> > + */
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +                               domid_t domid,
> > +                               ioservid_t *id);
> > +
> > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > +                                 domid_t domid,
> > +                                 ioservid_t id,
> > +                                 xen_pfn_t *pfn,
> > +                                 xen_pfn_t *buf_pfn,
> > +                                 evtchn_port_t *buf_port);
> > +
> > +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
> > +                                        domid_t domid,
> > +                                        ioservid_t id,
> > +                                        int is_mmio,
> > +                                        uint64_t start,
> > +                                        uint64_t end);
> > +
> > +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> > +                                            domid_t domid,
> > +                                            ioservid_t id,
> > +                                            int is_mmio,
> > +                                            uint64_t start);
> > +
> > +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
> > +                                      domid_t domid,
> > +                                      ioservid_t id,
> > +                                      uint16_t bdf);
> > +
> > +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> > +                                          domid_t domid,
> > +                                          ioservid_t id,
> > +                                          uint16_t bdf);
> > +
> > +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> > +                                domid_t domid,
> > +                                ioservid_t id);
> > +
> >  /* HVM guest pass-through */
> >  int xc_assign_device(xc_interface *xch,
> >                       uint32_t domid,
> > @@ -2428,3 +2469,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t
> type, uint16_t arch,
> >  int xc_kexec_unload(xc_interface *xch, int type);
> >
> >  #endif /* XENCTRL_H */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> > index a0e30e1..1300933 100644
> > --- a/tools/libxc/xenguest.h
> > +++ b/tools/libxc/xenguest.h
> > @@ -234,6 +234,8 @@ struct xc_hvm_build_args {
> >      struct xc_hvm_firmware_module smbios_module;
> >      /* Whether to use claim hypercall (1 - enable, 0 - disable). */
> >      int claim_enabled;
> > +    /* Maximum number of emulators for VM */
> > +    int max_emulators;
> >  };
> >
> >  /**
> > @@ -306,3 +308,13 @@ xen_pfn_t *xc_map_m2p(xc_interface *xch,
> >                        int prot,
> >                        unsigned long *mfn0);
> >  #endif /* XENGUEST_H */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> > index f859621..5170b7f 100644
> > --- a/tools/libxc/xg_save_restore.h
> > +++ b/tools/libxc/xg_save_restore.h
> > @@ -259,6 +259,7 @@
> >  #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
> >  #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
> >  #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific
> info */
> > +#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
> >
> >  /*
> >  ** We process save/restore/migrate in batches of pages; the below
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 06bbca6..5a70b76 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -95,6 +95,14 @@
> >  #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
> >
> >  /*
> > + * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
> > + * max_emulators field is present in the hvm sections of
> > + * libxl_domain_build_info. This field can be used to reserve
> > + * extra special pages for secondary device emulators.
> > + */
> > +#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
> > +
> > +/*
> >   * libxl ABI compatibility
> >   *
> >   * The only guarantee which libxl makes regarding ABI compatibility
> > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > index a604cd8..cce93d9 100644
> > --- a/tools/libxl/libxl_create.c
> > +++ b/tools/libxl/libxl_create.c
> > @@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc
> *gc,
> >
> >          libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
> >
> > +        if (b_info->u.hvm.max_emulators < 1)
> > +            b_info->u.hvm.max_emulators = 1;
> > +
> >          break;
> >      case LIBXL_DOMAIN_TYPE_PV:
> >          libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
> > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> > index 55f74b2..9de06f9 100644
> > --- a/tools/libxl/libxl_dom.c
> > +++ b/tools/libxl/libxl_dom.c
> > @@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> >      args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb)
> << 10;
> >      args.mem_target = (uint64_t)(info->target_memkb - info-
> >video_memkb) << 10;
> >      args.claim_enabled = libxl_defbool_val(info->claim_mode);
> > +    args.max_emulators = info->u.hvm.max_emulators;
> >      if (libxl__domain_firmware(gc, info, &args)) {
> >          LOG(ERROR, "initializing domain firmware failed");
> >          goto out;
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 649ce50..b707159 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -372,6 +372,7 @@ libxl_domain_build_info =
> Struct("domain_build_info",[
> >                                         ("xen_platform_pci", libxl_defbool),
> >                                         ("usbdevice_list",   libxl_string_list),
> >                                         ("vendor_device",    libxl_vendor_device),
> > +                                       ("max_emulators",    integer),
> >                                         ])),
> >                   ("pv", Struct(None, [("kernel", string),
> >                                        ("slack_memkb", MemKB),
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 4fc46eb..cf9b67d 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1750,6 +1750,9 @@ skip_vfb:
> >
> >              b_info->u.hvm.vendor_device = d;
> >          }
> > +
> > +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> > +            b_info->u.hvm.max_emulators = l + 1;
> >      }
> >
> >      xlu_cfg_destroy(config);
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index fb2dd73..e8b73fa 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct
> hvm_ioreq_server *s, int id)
> >  bool_t hvm_io_pending(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> > +    struct list_head *entry;
> >
> > -    if ( !s )
> > -        return 0;
> > +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> >
> > -    p = get_ioreq(s, v->vcpu_id);
> > -    return ( p->state != STATE_IOREQ_NONE );
> > +        p = get_ioreq(s, v->vcpu_id);
> > +        if ( p->state != STATE_IOREQ_NONE )
> > +            return 1;
> > +    }
> > +
> > +    return 0;
> >  }
> >
> >  static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> > @@ -394,18 +401,20 @@ static void hvm_wait_on_io(struct domain *d,
> ioreq_t *p)
> >  void hvm_do_resume(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct list_head *entry;
> >
> >      check_wakeup_from_wait();
> >
> >      if ( is_hvm_vcpu(v) )
> >          pt_restore_timer(v);
> >
> > -    if ( s )
> > +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
> >      {
> > -        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> >
> > -        hvm_wait_on_io(d, p);
> > +        hvm_wait_on_io(d, get_ioreq(s, v->vcpu_id));
> >      }
> >
> >      /* Inject pending hw/sw trap */
> > @@ -543,6 +552,83 @@ static int hvm_print_line(
> >      return X86EMUL_OKAY;
> >  }
> >
> > +static int hvm_access_cf8(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct vcpu *curr = current;
> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> > +    int rc;
> > +
> > +    BUG_ON(port < 0xcf8);
> > +    port -= 0xcf8;
> > +
> > +    spin_lock(&hd->pci_lock);
> > +
> > +    if ( dir == IOREQ_WRITE )
> > +    {
> > +        switch ( bytes )
> > +        {
> > +        case 4:
> > +            hd->pci_cf8 = *val;
> > +            break;
> > +
> > +        case 2:
> > +        {
> > +            uint32_t mask = 0xffff << (port * 8);
> > +            uint32_t subval = *val << (port * 8);
> > +
> > +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> > +                          (subval & mask);
> > +            break;
> > +        }
> > +
> > +        case 1:
> > +        {
> > +            uint32_t mask = 0xff << (port * 8);
> > +            uint32_t subval = *val << (port * 8);
> > +
> > +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> > +                          (subval & mask);
> > +            break;
> > +        }
> > +
> > +        default:
> > +            break;
> > +        }
> > +
> > +        /* We always need to fall through to the catch all emulator */
> > +        rc = X86EMUL_UNHANDLEABLE;
> > +    }
> > +    else
> > +    {
> > +        switch ( bytes )
> > +        {
> > +        case 4:
> > +            *val = hd->pci_cf8;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        case 2:
> > +            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        case 1:
> > +            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        default:
> > +            rc = X86EMUL_UNHANDLEABLE;
> > +            break;
> > +        }
> > +    }
> > +
> > +    spin_unlock(&hd->pci_lock);
> > +
> > +    return rc;
> > +}
> > +
> >  static int handle_pvh_io(
> >      int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >  {
> > @@ -618,39 +704,53 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
> >      }
> >  }
> >
> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +static int hvm_create_ioreq_server(struct domain *d, ioservid_t id,
> domid_t domid)
> >  {
> >      struct hvm_ioreq_server *s;
> >      unsigned long pfn;
> >      struct vcpu *v;
> >      int i, rc;
> >
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> >      rc = -EEXIST;
> > -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > -        goto fail_exist;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto fail_exist;
> > +    }
> >
> > -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> > +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
> >
> >      rc = -ENOMEM;
> >      s = xzalloc(struct hvm_ioreq_server);
> >      if ( !s )
> >          goto fail_alloc;
> >
> > +    s->id = id;
> >      s->domain = d;
> >      s->domid = domid;
> > +    INIT_LIST_HEAD(&s->mmio_range_list);
> > +    INIT_LIST_HEAD(&s->portio_range_list);
> > +    INIT_LIST_HEAD(&s->pcidev_list);
> >
> >      for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> >          s->ioreq_evtchn[i] = -1;
> >      s->buf_ioreq_evtchn = -1;
> >
> >      /* Initialize shared pages */
> > -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > +    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
> >
> >      hvm_init_ioreq_page(s, 0);
> >      if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
> >          goto fail_set_ioreq;
> >
> > -    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> > +    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s-
> >id;
> >
> >      hvm_init_ioreq_page(s, 1);
> >      if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
> > @@ -664,10 +764,12 @@ static int hvm_create_ioreq_server(struct domain
> *d, domid_t domid)
> >              goto fail_add_vcpu;
> >      }
> >
> > -    d->arch.hvm_domain.ioreq_server = s;
> > +    list_add(&s->list_entry,
> > +             &d->arch.hvm_domain.ioreq_server_list);
> >
> >      domain_unpause(d);
> >
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >      return 0;
> >
> >  fail_add_vcpu:
> > @@ -681,23 +783,33 @@ fail_set_ioreq:
> >      xfree(s);
> >  fail_alloc:
> >  fail_exist:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >      return rc;
> >  }
> >
> > -static void hvm_destroy_ioreq_server(struct domain *d)
> > +static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
> >  {
> >      struct hvm_ioreq_server *s;
> >      struct vcpu *v;
> >
> > -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> > -    s = d->arch.hvm_domain.ioreq_server;
> > -    if ( !s )
> > -        return;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry)
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> > +    goto done;
> > +
> > +found:
> > +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
> >
> >      domain_pause(d);
> >
> > -    d->arch.hvm_domain.ioreq_server = NULL;
> > +    list_del_init(&s->list_entry);
> >
> >      for_each_vcpu ( d, v )
> >          hvm_ioreq_server_remove_vcpu(s, v);
> > @@ -708,31 +820,373 @@ static void hvm_destroy_ioreq_server(struct
> domain *d)
> >      hvm_destroy_ioreq_page(s, 0);
> >
> >      xfree(s);
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >  }
> >
> > -static int hvm_get_ioreq_server_buf_port(struct domain *d,
> evtchn_port_t *port)
> > +static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t
> id,
> > +                                         evtchn_port_t *port)
> >  {
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct list_head *entry;
> > +    int rc;
> >
> > -    if ( !s )
> > -        return -ENOENT;
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        if ( s->id == id )
> > +        {
> > +            *port = s->buf_ioreq_evtchn;
> > +            rc = 0;
> > +            break;
> > +        }
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, int
> buf,
> > +                                    xen_pfn_t *pfn)
> > +{
> > +    struct list_head *entry;
> > +    int rc;
> > +
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        if ( s->id == id )
> > +        {
> > +            int i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN :
> HVM_PARAM_IOREQ_PFN;
> > +
> > +            *pfn = d->arch.hvm_domain.params[i] - s->id;
> > +            rc = 0;
> > +            break;
> > +        }
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_map_io_range_to_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                            int is_mmio, uint64_t start, uint64_t end)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct hvm_io_range *x;
> > +    struct list_head *list;
> > +    int rc;
> > +
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    x = xmalloc(struct hvm_io_range);
> > +    if ( x == NULL )
> > +        return -ENOMEM;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> > +    goto fail;
> > +
> > +found:
> > +    INIT_RCU_HEAD(&x->rcu);
> > +    x->start = start;
> > +    x->end = end;
> > +
> > +    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
> > +    list_add_rcu(&x->list_entry, list);
> > +
> > +    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
> > +             d->domain_id,
> > +             s->id,
> > +             ( is_mmio ) ? "MMIO" : "PORTIO",
> > +             x->start,
> > +             x->end);
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> > -    *port = s->buf_ioreq_evtchn;
> >      return 0;
> > +
> > +fail:
> > +    xfree(x);
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> >  }
> >
> > -static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t
> *pfn)
> > +static void free_io_range(struct rcu_head *rcu)
> >  {
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    int i;
> > +    struct hvm_io_range *x;
> >
> > -    if ( !s )
> > -        return -ENOENT;
> > +    x = container_of (rcu, struct hvm_io_range, rcu);
> > +
> > +    xfree(x);
> > +}
> > +
> > +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                                int is_mmio, uint64_t start)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct list_head *list, *entry;
> > +    int rc;
> > +
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> > +    goto done;
> > +
> > +found:
> > +    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
> > +
> > +    list_for_each ( entry,
> > +                    list )
> > +    {
> > +        struct hvm_io_range *x = list_entry(entry,
> > +                                            struct hvm_io_range,
> > +                                            list_entry);
> > +
> > +        if ( start == x->start )
> > +        {
> > +            gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
> > +                     d->domain_id,
> > +                     s->id,
> > +                     ( is_mmio ) ? "MMIO" : "PORTIO",
> > +                     x->start,
> > +                     x->end);
> > +
> > +            list_del_rcu(&x->list_entry);
> > +            call_rcu(&x->rcu, free_io_range);
> >
> > -    i = ( buf ) ? HVM_PARAM_BUFIOREQ_PFN : HVM_PARAM_IOREQ_PFN;
> > -    *pfn = d->arch.hvm_domain.params[i];
> > +            rc = 0;
> > +            break;
> > +        }
> > +    }
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t
> id,
> > +                                          uint16_t bdf)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct hvm_pcidev *x;
> > +    int rc;
> > +
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    x = xmalloc(struct hvm_pcidev);
> > +    if ( x == NULL )
> > +        return -ENOMEM;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> > +    goto fail;
> > +
> > +found:
> > +    INIT_RCU_HEAD(&x->rcu);
> > +    x->bdf = bdf;
> > +
> > +    list_add_rcu(&x->list_entry, &s->pcidev_list);
> > +
> > +    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
> > +             d->domain_id,
> > +             s->id,
> > +             x->bdf);
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> >      return 0;
> > +
> > +fail:
> > +    xfree(x);
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static void free_pcidev(struct rcu_head *rcu)
> > +{
> > +    struct hvm_pcidev *x;
> > +
> > +    x = container_of (rcu, struct hvm_pcidev, rcu);
> > +
> > +    xfree(x);
> > +}
> > +
> > +static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d,
> ioservid_t id,
> > +                                              uint16_t bdf)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct list_head *entry;
> > +    int rc;
> > +
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> > +    goto done;
> > +
> > +found:
> > +    list_for_each ( entry,
> > +                    &s->pcidev_list )
> > +    {
> > +        struct hvm_pcidev *x = list_entry(entry,
> > +                                          struct hvm_pcidev,
> > +                                          list_entry);
> > +
> > +        if ( bdf == x->bdf )
> > +        {
> > +            gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
> > +                     d->domain_id,
> > +                     s->id,
> > +                     x->bdf);
> > +
> > +            list_del_rcu(&x->list_entry);
> > +            call_rcu(&x->rcu, free_pcidev);
> > +
> > +            rc = 0;
> > +            break;
> > +        }
> > +    }
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu
> *v)
> > +{
> > +    struct list_head *entry;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
> > +            goto fail;
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return 0;
> > +
> > +fail:
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct
> vcpu *v)
> > +{
> > +    struct list_head *entry;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +}
> > +
> > +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> > +{
> > +    ioservid_t id;
> > +
> > +    for ( id = 0;
> > +          id < d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> > +          id++ )
> > +        hvm_destroy_ioreq_server(d, id);
> >  }
> >
> >  static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> > @@ -750,18 +1204,31 @@ static int hvm_replace_event_channel(struct
> vcpu *v, domid_t remote_domid,
> >      return 0;
> >  }
> >
> > -static int hvm_set_ioreq_server_domid(struct domain *d, domid_t
> domid)
> > +static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id,
> domid_t domid)
> >  {
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> >      struct vcpu *v;
> >      int rc = 0;
> >
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> >      domain_pause(d);
> >
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == id )
> > +            goto found;
> > +    }
> > +
> >      rc = -ENOENT;
> > -    if ( !s )
> > -        goto done;
> > +    goto done;
> >
> > +found:
> >      rc = 0;
> >      if ( s->domid == domid )
> >          goto done;
> > @@ -787,6 +1254,8 @@ static int hvm_set_ioreq_server_domid(struct
> domain *d, domid_t domid)
> >  done:
> >      domain_unpause(d);
> >
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> >      return rc;
> >  }
> >
> > @@ -817,6 +1286,9 @@ int hvm_domain_initialise(struct domain *d)
> >
> >      }
> >
> > +    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
> > +    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
> > +    spin_lock_init(&d->arch.hvm_domain.pci_lock);
> >      spin_lock_init(&d->arch.hvm_domain.irq_lock);
> >      spin_lock_init(&d->arch.hvm_domain.uc_lock);
> >
> > @@ -858,6 +1330,7 @@ int hvm_domain_initialise(struct domain *d)
> >      rtc_init(d);
> >
> >      register_portio_handler(d, 0xe9, 1, hvm_print_line);
> > +    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
> >
> >      rc = hvm_funcs.domain_initialise(d);
> >      if ( rc != 0 )
> > @@ -888,7 +1361,7 @@ void hvm_domain_relinquish_resources(struct
> domain *d)
> >      if ( hvm_funcs.nhvm_domain_relinquish_resources )
> >          hvm_funcs.nhvm_domain_relinquish_resources(d);
> >
> > -    hvm_destroy_ioreq_server(d);
> > +    hvm_destroy_all_ioreq_servers(d);
> >
> >      msixtbl_pt_cleanup(d);
> >
> > @@ -1520,7 +1993,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >  {
> >      int rc;
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> >
> >      hvm_asid_flush_vcpu(v);
> >
> > @@ -1563,12 +2035,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
> nestedhvm_vcpu_destroy */
> >          goto fail5;
> >
> > -    if ( s )
> > -    {
> > -        rc = hvm_ioreq_server_add_vcpu(s, v);
> > -        if ( rc < 0 )
> > -            goto fail6;
> > -    }
> > +    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
> > +    if ( rc < 0 )
> > +        goto fail6;
> >
> >      if ( v->vcpu_id == 0 )
> >      {
> > @@ -1604,10 +2073,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >  void hvm_vcpu_destroy(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> >
> > -    if ( s )
> > -        hvm_ioreq_server_remove_vcpu(s, v);
> > +    hvm_all_ioreq_servers_remove_vcpu(d, v);
> >
> >      nestedhvm_vcpu_destroy(v);
> >
> > @@ -1646,11 +2113,112 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > +static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
> > +
> > +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v,
> ioreq_t *p)
> > +{
> > +#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
> > +
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +    uint8_t type;
> > +    uint64_t addr;
> > +
> > +    if ( p->type == IOREQ_TYPE_PIO &&
> > +         (p->addr & ~3) == 0xcfc )
> > +    {
> > +        /* PCI config data cycle */
> > +        type = IOREQ_TYPE_PCI_CONFIG;
> > +
> > +        spin_lock(&d->arch.hvm_domain.pci_lock);
> > +        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
> > +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> > +    }
> > +    else
> > +    {
> > +        type = p->type;
> > +        addr = p->addr;
> > +    }
> > +
> > +    rcu_read_lock(&ioreq_server_rcu_lock);
> > +
> > +    switch ( type )
> > +    {
> > +    case IOREQ_TYPE_COPY:
> > +    case IOREQ_TYPE_PIO:
> > +    case IOREQ_TYPE_PCI_CONFIG:
> > +        break;
> > +    default:
> > +        goto done;
> > +    }
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        switch ( type )
> > +        {
> > +            case IOREQ_TYPE_COPY:
> > +            case IOREQ_TYPE_PIO: {
> > +                struct list_head *list;
> > +                struct hvm_io_range *x;
> > +
> > +                list = ( type == IOREQ_TYPE_COPY ) ?
> > +                    &s->mmio_range_list :
> > +                    &s->portio_range_list;
> > +
> > +                list_for_each_entry ( x,
> > +                                      list,
> > +                                      list_entry )
> > +                {
> > +                    if ( (addr >= x->start) && (addr <= x->end) )
> > +                        goto found;
> > +                }
> > +                break;
> > +            }
> > +            case IOREQ_TYPE_PCI_CONFIG: {
> > +                struct hvm_pcidev *x;
> > +
> > +                list_for_each_entry ( x,
> > +                                      &s->pcidev_list,
> > +                                      list_entry )
> > +                {
> > +                    if ( BDF(addr) == x->bdf ) {
> > +                        p->type = type;
> > +                        p->addr = addr;
> > +                        goto found;
> > +                    }
> > +                }
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +done:
> > +    /* The catch-all server has id 0 */
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == 0 )
> > +            goto found;
> > +    }
> > +
> > +    s = NULL;
> > +
> > +found:
> > +    rcu_read_unlock(&ioreq_server_rcu_lock);
> > +
> > +    return s;
> > +
> > +#undef BDF
> > +}
> > +
> >  int hvm_buffered_io_send(ioreq_t *p)
> >  {
> >      struct vcpu *v = current;
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> >      struct hvm_ioreq_page *iorp;
> >      buffered_iopage_t *pg;
> >      buf_ioreq_t bp;
> > @@ -1660,6 +2228,7 @@ int hvm_buffered_io_send(ioreq_t *p)
> >      /* Ensure buffered_iopage fits in a page */
> >      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> >
> > +    s = hvm_select_ioreq_server(v, p);
> >      if ( !s )
> >          return 0;
> >
> > @@ -1770,18 +2339,34 @@ static bool_t
> hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
> >
> >  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
> >  {
> > -    struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> >
> >      if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> >          return 0;
> >
> > +    s = hvm_select_ioreq_server(v, p);
> >      if ( !s )
> >          return 0;
> >
> >      return hvm_send_assist_req_to_server(s, v, p);
> >  }
> >
> > +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct list_head *entry;
> > +
> > +    list_for_each ( entry,
> > +                    &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +
> > +        (void) hvm_send_assist_req_to_server(s, v, p);
> > +    }
> > +}
> > +
> >  void hvm_hlt(unsigned long rflags)
> >  {
> >      struct vcpu *curr = current;
> > @@ -4370,6 +4955,215 @@ static int hvmop_flush_tlb_all(void)
> >      return 0;
> >  }
> >
> > +static int hvmop_create_ioreq_server(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> > +{
> > +    struct domain *curr_d = current->domain;
> > +    xen_hvm_create_ioreq_server_t op;
> > +    struct domain *d;
> > +    ioservid_t id;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = -ENOSPC;
> > +    for ( id = 1;
> > +          id <  d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> > +          id++ )
> > +    {
> > +        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
> > +        if ( rc == -EEXIST )
> > +            continue;
> > +
> > +        break;
> > +    }
> > +
> > +    if ( rc == -EEXIST )
> > +        rc = -ENOSPC;
> > +
> > +    if ( rc < 0 )
> > +        goto out;
> > +
> > +    op.id = id;
> > +
> > +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_get_ioreq_server_info(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t)
> uop)
> > +{
> > +    xen_hvm_get_ioreq_server_info_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
> > +        goto out;
> > +
> > +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
> > +        goto out;
> > +
> > +    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
> > +        goto out;
> > +
> > +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_map_io_range_to_ioreq_server(
> > +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t
> ) uop)
> > +{
> > +    xen_hvm_map_io_range_to_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
> > +                                          op.start, op.end);
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_unmap_io_range_from_ioreq_server(
> > +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_ser
> ver_t) uop)
> > +{
> > +    xen_hvm_unmap_io_range_from_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
> > +                                              op.start);
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_map_pcidev_to_ioreq_server(
> > +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t)
> uop)
> > +{
> > +    xen_hvm_map_pcidev_to_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_unmap_pcidev_from_ioreq_server(
> > +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_serve
> r_t) uop)
> > +{
> > +    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +static int hvmop_destroy_ioreq_server(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t)
> uop)
> > +{
> > +    xen_hvm_destroy_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    hvm_destroy_ioreq_server(d, op.id);
> > +    rc = 0;
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> >  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
> arg)
> >
> >  {
> > @@ -4378,6 +5172,41 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >
> >      switch ( op )
> >      {
> > +    case HVMOP_create_ioreq_server:
> > +        rc = hvmop_create_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_get_ioreq_server_info:
> > +        rc = hvmop_get_ioreq_server_info(
> > +            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
> > +        break;
> > +
> > +    case HVMOP_map_io_range_to_ioreq_server:
> > +        rc = hvmop_map_io_range_to_ioreq_server(
> > +            guest_handle_cast(arg,
> xen_hvm_map_io_range_to_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_unmap_io_range_from_ioreq_server:
> > +        rc = hvmop_unmap_io_range_from_ioreq_server(
> > +            guest_handle_cast(arg,
> xen_hvm_unmap_io_range_from_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_map_pcidev_to_ioreq_server:
> > +        rc = hvmop_map_pcidev_to_ioreq_server(
> > +            guest_handle_cast(arg,
> xen_hvm_map_pcidev_to_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_unmap_pcidev_from_ioreq_server:
> > +        rc = hvmop_unmap_pcidev_from_ioreq_server(
> > +            guest_handle_cast(arg,
> xen_hvm_unmap_pcidev_from_ioreq_server_t));
> > +        break;
> > +
> > +    case HVMOP_destroy_ioreq_server:
> > +        rc = hvmop_destroy_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> > +        break;
> > +
> >      case HVMOP_set_param:
> >      case HVMOP_get_param:
> >      {
> > @@ -4466,9 +5295,9 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  if ( a.value == DOMID_SELF )
> >                      a.value = curr_d->domain_id;
> >
> > -                rc = hvm_create_ioreq_server(d, a.value);
> > +                rc = hvm_create_ioreq_server(d, 0, a.value);
> >                  if ( rc == -EEXIST )
> > -                    rc = hvm_set_ioreq_server_domid(d, a.value);
> > +                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
> >                  break;
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  /* Not reflexive, as we must domain_pause(). */
> > @@ -4533,6 +5362,10 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  if ( a.value > SHUTDOWN_MAX )
> >                      rc = -EINVAL;
> >                  break;
> > +            case HVM_PARAM_NR_IOREQ_SERVERS:
> > +                if ( d == current->domain )
> > +                    rc = -EPERM;
> > +                break;
> >              }
> >
> >              if ( rc == 0 )
> > @@ -4567,7 +5400,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >              case HVM_PARAM_BUFIOREQ_PFN:
> >              case HVM_PARAM_BUFIOREQ_EVTCHN:
> >                  /* May need to create server */
> > -                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> > +                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
> >                  if ( rc != 0 && rc != -EEXIST )
> >                      goto param_fail;
> >
> > @@ -4576,7 +5409,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_IOREQ_PFN: {
> >                      xen_pfn_t pfn;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = pfn;
> > @@ -4585,7 +5418,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_BUFIOREQ_PFN: {
> >                      xen_pfn_t pfn;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = pfn;
> > @@ -4594,7 +5427,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_BUFIOREQ_EVTCHN: {
> >                      evtchn_port_t port;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = port;
> > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> > index c9adb94..ac0d867 100644
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -75,7 +75,7 @@ void send_invalidate_req(void)
> >          .data = ~0UL, /* flush all */
> >      };
> >
> > -    (void)hvm_send_assist_req(v, &p);
> > +    hvm_broadcast_assist_req(v, &p);
> >  }
> >
> >  int handle_mmio(void)
> > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> > index a77b83d..e9da543 100644
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -41,17 +41,38 @@ struct hvm_ioreq_page {
> >      void *va;
> >  };
> >
> > +struct hvm_io_range {
> > +    struct list_head    list_entry;
> > +    uint64_t            start, end;
> > +    struct rcu_head     rcu;
> > +};
> > +
> > +struct hvm_pcidev {
> > +    struct list_head    list_entry;
> > +    uint16_t            bdf;
> > +    struct rcu_head     rcu;
> > +};
> > +
> >  struct hvm_ioreq_server {
> > +    struct list_head       list_entry;
> > +    ioservid_t             id;
> >      struct domain          *domain;
> >      domid_t                domid;
> >      struct hvm_ioreq_page  ioreq;
> >      int                    ioreq_evtchn[MAX_HVM_VCPUS];
> >      struct hvm_ioreq_page  buf_ioreq;
> >      int                    buf_ioreq_evtchn;
> > +    struct list_head       mmio_range_list;
> > +    struct list_head       portio_range_list;
> > +    struct list_head       pcidev_list;
> >  };
> >
> >  struct hvm_domain {
> > -    struct hvm_ioreq_server *ioreq_server;
> > +    struct list_head        ioreq_server_list;
> > +    spinlock_t              ioreq_server_lock;
> > +    uint32_t                pci_cf8;
> > +    spinlock_t              pci_lock;
> > +
> >      struct pl_time         pl_time;
> >
> >      struct hvm_io_handler *io_handler;
> > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-
> x86/hvm/hvm.h
> > index 40aeddf..4118669 100644
> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -229,6 +229,7 @@ int prepare_ring_for_helper(struct domain *d,
> unsigned long gmfn,
> >  void destroy_ring_for_helper(void **_va, struct page_info *page);
> >
> >  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> > +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
> >
> >  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
> >  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index a9aab4b..6b31189 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -23,6 +23,7 @@
> >
> >  #include "../xen.h"
> >  #include "../trace.h"
> > +#include "../event_channel.h"
> >
> >  /* Get/set subcommands: extra argument == pointer to xen_hvm_param
> struct. */
> >  #define HVMOP_set_param           0
> > @@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
> >  typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
> >
> > +typedef uint32_t ioservid_t;
> > +
> > +DEFINE_XEN_GUEST_HANDLE(ioservid_t);
> > +
> > +#define HVMOP_create_ioreq_server 17
> > +struct xen_hvm_create_ioreq_server {
> > +    domid_t domid;  /* IN - domain to be serviced */
> > +    ioservid_t id;  /* OUT - server id */
> > +};
> > +typedef struct xen_hvm_create_ioreq_server
> xen_hvm_create_ioreq_server_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
> > +
> > +#define HVMOP_get_ioreq_server_info 18
> > +struct xen_hvm_get_ioreq_server_info {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    ioservid_t id;          /* IN - server id */
> > +    xen_pfn_t pfn;          /* OUT - ioreq pfn */
> > +    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
> > +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> > +};
> > +typedef struct xen_hvm_get_ioreq_server_info
> xen_hvm_get_ioreq_server_info_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> > +
> > +#define HVMOP_map_io_range_to_ioreq_server 19
> > +struct xen_hvm_map_io_range_to_ioreq_server {
> > +    domid_t domid;                  /* IN - domain to be serviced */
> > +    ioservid_t id;                  /* IN - handle from
> HVMOP_register_ioreq_server */
> > +    int is_mmio;                    /* IN - MMIO or port IO? */
> > +    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> > +};
> > +typedef struct xen_hvm_map_io_range_to_ioreq_server
> xen_hvm_map_io_range_to_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_
> t);
> > +
> > +#define HVMOP_unmap_io_range_from_ioreq_server 20
> > +struct xen_hvm_unmap_io_range_from_ioreq_server {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint8_t is_mmio;        /* IN - MMIO or port IO? */
> > +    uint64_aligned_t start; /* IN - start address of the range to remove */
> > +};
> > +typedef struct xen_hvm_unmap_io_range_from_ioreq_server
> xen_hvm_unmap_io_range_from_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_se
> rver_t);
> > +
> > +#define HVMOP_map_pcidev_to_ioreq_server 21
> > +struct xen_hvm_map_pcidev_to_ioreq_server {
> > +    domid_t domid;      /* IN - domain to be serviced */
> > +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> > +};
> > +typedef struct xen_hvm_map_pcidev_to_ioreq_server
> xen_hvm_map_pcidev_to_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
> > +
> > +#define HVMOP_unmap_pcidev_from_ioreq_server 22
> > +struct xen_hvm_unmap_pcidev_from_ioreq_server {
> > +    domid_t domid;      /* IN - domain to be serviced */
> > +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> > +};
> > +typedef struct xen_hvm_unmap_pcidev_from_ioreq_server
> xen_hvm_unmap_pcidev_from_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_serv
> er_t);
> > +
> > +#define HVMOP_destroy_ioreq_server 23
> > +struct xen_hvm_destroy_ioreq_server {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    ioservid_t id;          /* IN - server id */
> > +};
> > +typedef struct xen_hvm_destroy_ioreq_server
> xen_hvm_destroy_ioreq_server_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
> > +
> >  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
> >
> >  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> > diff --git a/xen/include/public/hvm/ioreq.h
> b/xen/include/public/hvm/ioreq.h
> > index f05d130..e84fa75 100644
> > --- a/xen/include/public/hvm/ioreq.h
> > +++ b/xen/include/public/hvm/ioreq.h
> > @@ -34,6 +34,7 @@
> >
> >  #define IOREQ_TYPE_PIO          0 /* pio */
> >  #define IOREQ_TYPE_COPY         1 /* mmio ops */
> > +#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
> >  #define IOREQ_TYPE_TIMEOFFSET   7
> >  #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
> >
> > diff --git a/xen/include/public/hvm/params.h
> b/xen/include/public/hvm/params.h
> > index 517a184..4109b11 100644
> > --- a/xen/include/public/hvm/params.h
> > +++ b/xen/include/public/hvm/params.h
> > @@ -145,6 +145,8 @@
> >  /* SHUTDOWN_* action in case of a triple fault */
> >  #define HVM_PARAM_TRIPLE_FAULT_REASON 31
> >
> > -#define HVM_NR_PARAMS          32
> > +#define HVM_PARAM_NR_IOREQ_SERVERS 32
> > +
> > +#define HVM_NR_PARAMS          33
> >
> >  #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-04 11:40 ` [PATCH v2 5/6] ioreq-server: add support for multiple servers Paul Durrant
  2014-03-04 12:06   ` Andrew Cooper
@ 2014-03-10 18:41   ` George Dunlap
  2014-03-11 10:41     ` Paul Durrant
  1 sibling, 1 reply; 20+ messages in thread
From: George Dunlap @ 2014-03-10 18:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel@lists.xen.org

On Tue, Mar 4, 2014 at 11:40 AM, Paul Durrant <paul.durrant@citrix.com> wrote:
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index 1f6ce50..3116653 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -746,6 +746,7 @@ typedef struct {
>      uint64_t acpi_ioport_location;
>      uint64_t viridian;
>      uint64_t vm_generationid_addr;
> +    uint64_t nr_ioreq_servers;
>
>      struct toolstack_data_t tdata;
>  } pagebuf_t;
> @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
>          DPRINTF("read generation id buffer address");
>          return pagebuf_get_one(xch, ctx, buf, fd, dom);
>
> +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> +        /* Skip padding 4 bytes then read the acpi ioport location. */

This comment might be confusing. :-)

> +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> +        {
> +            PERROR("error reading the number of IOREQ servers");
> +            return -1;
> +        }
> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> +
>      default:
>          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>              ERROR("Max batch size exceeded (%d). Giving up.", count);
>


> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index b65e702..6d6328a 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -45,7 +45,7 @@
>  #define SPECIALPAGE_IDENT_PT 4
>  #define SPECIALPAGE_CONSOLE  5
>  #define SPECIALPAGE_IOREQ    6
> -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
> +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */

"each ioreq server needs 2 pages"?

>  #define special_pfn(x) (0xff000u - 1 - (x))
>
>  #define VGA_HOLE_SIZE (0x20)
>

[snip]

> @@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
>      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
>                       special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);

Similarly, special_pfn(SPECIALPAGE_IOREQ+max_emulators)?

Although actually, are you planning to make it possible to add more
emulators (above "max_emulators") dynamically after the VM is created
-- maybe in a future series?

If not, and you're always going to be statically allocating a fixed
number of emulators at the beginning, there's not actually a reason to
change the direction that the special PFNs go at all.

> +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                     max_emulators);
>
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when

[snip]

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4fc46eb..cf9b67d 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1750,6 +1750,9 @@ skip_vfb:
>
>              b_info->u.hvm.vendor_device = d;
>          }
> +
> +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> +            b_info->u.hvm.max_emulators = l + 1;

Do we want to give this a more structured naming convention?

device_model_secondary_max?  device_model_secondary_emulators?

Also, how are you planning on starting these secondary emulators?
Would it make sense for libxl to start them, in which case it should
be able to do its own counting?  Or are you envisioning starting /
destroying secondary emulators as the guest is running?

>      }
>
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index fb2dd73..e8b73fa 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
>  bool_t hvm_io_pending(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -    ioreq_t *p;
> +    struct list_head *entry;
>
> -    if ( !s )
> -        return 0;
> +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                list_entry);
> +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
>
> -    p = get_ioreq(s, v->vcpu_id);
> -    return ( p->state != STATE_IOREQ_NONE );
> +        p = get_ioreq(s, v->vcpu_id);
> +        if ( p->state != STATE_IOREQ_NONE )
> +            return 1;

Redundant calls to get_ioreq().

> +    }
> +
> +    return 0;
>  }
>
>  static void hvm_wait_on_io(struct domain *d, ioreq_t *p)

[snip]

> +static int hvm_access_cf8(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)

I take it this is part of virtualizing the pci space?

This wasn't mentioned in the commit message; it seems like it probably
should have been introduced in a separate patch.

> +{
> +    struct vcpu *curr = current;
> +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> +    int rc;
> +
> +    BUG_ON(port < 0xcf8);
> +    port -= 0xcf8;
> +
> +    spin_lock(&hd->pci_lock);
> +
> +    if ( dir == IOREQ_WRITE )
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            hd->pci_cf8 = *val;
> +            break;
> +
> +        case 2:
> +        {
> +            uint32_t mask = 0xffff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +
> +        case 1:
> +        {
> +            uint32_t mask = 0xff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +
> +        default:
> +            break;
> +        }
> +
> +        /* We always need to fall through to the catch all emulator */
> +        rc = X86EMUL_UNHANDLEABLE;
> +    }
> +    else
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            *val = hd->pci_cf8;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        case 2:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        case 1:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        default:
> +            rc = X86EMUL_UNHANDLEABLE;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&hd->pci_lock);
> +
> +    return rc;
> +}
> +
>  static int handle_pvh_io(
>      int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>  {
> @@ -618,39 +704,53 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
>      }
>  }
>
> -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
>  {
>      struct hvm_ioreq_server *s;
>      unsigned long pfn;
>      struct vcpu *v;
>      int i, rc;
>
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);

Hmm, so for a few patches we're just completely lockless?

Regressions like that can wreak havoc on bisections.

[snip]

> @@ -1646,11 +2113,112 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>
> +static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
> +
> +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v, ioreq_t *p)
> +{
> +#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
> +
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +
> +    if ( p->type == IOREQ_TYPE_PIO &&
> +         (p->addr & ~3) == 0xcfc )
> +    {
> +        /* PCI config data cycle */
> +        type = IOREQ_TYPE_PCI_CONFIG;
> +
> +        spin_lock(&d->arch.hvm_domain.pci_lock);
> +        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
> +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> +    }
> +    else
> +    {
> +        type = p->type;
> +        addr = p->addr;
> +    }
> +
> +    rcu_read_lock(&ioreq_server_rcu_lock);
> +
> +    switch ( type )
> +    {
> +    case IOREQ_TYPE_COPY:
> +    case IOREQ_TYPE_PIO:
> +    case IOREQ_TYPE_PCI_CONFIG:
> +        break;
> +    default:
> +        goto done;
> +    }
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        switch ( type )
> +        {
> +            case IOREQ_TYPE_COPY:
> +            case IOREQ_TYPE_PIO: {
> +                struct list_head *list;
> +                struct hvm_io_range *x;
> +
> +                list = ( type == IOREQ_TYPE_COPY ) ?
> +                    &s->mmio_range_list :
> +                    &s->portio_range_list;
> +
> +                list_for_each_entry ( x,
> +                                      list,
> +                                      list_entry )
> +                {
> +                    if ( (addr >= x->start) && (addr <= x->end) )
> +                        goto found;
> +                }
> +                break;
> +            }
> +            case IOREQ_TYPE_PCI_CONFIG: {
> +                struct hvm_pcidev *x;
> +
> +                list_for_each_entry ( x,
> +                                      &s->pcidev_list,
> +                                      list_entry )
> +                {
> +                    if ( BDF(addr) == x->bdf ) {
> +                        p->type = type;
> +                        p->addr = addr;
> +                        goto found;
> +                    }
> +                }
> +                break;
> +            }
> +        }
> +    }
> +
> +done:
> +    /* The catch-all server has id 0 */
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          list_entry )
> +    {
> +        if ( s->id == 0 )
> +            goto found;
> +    }

This is an awful lot of code to go through for every single IO,
particularly if the common case is that there's only a single ioreq
server.  Have you done any performance tests with this on a workload
that has a high IO count?

I realize that the cost of going all the way to qemu and back will
still dominate the time, but I can't help but think this might add up,
and I wonder if having a fast-path for max_emulators=1 would make
sense on some of these potentially hot paths would make sense.

> @@ -4466,9 +5295,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value == DOMID_SELF )
>                      a.value = curr_d->domain_id;
>
> -                rc = hvm_create_ioreq_server(d, a.value);
> +                rc = hvm_create_ioreq_server(d, 0, a.value);
>                  if ( rc == -EEXIST )
> -                    rc = hvm_set_ioreq_server_domid(d, a.value);
> +                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
>                  break;

Is there a plan to deprecate this old way of creating ioreq_server 0
at some point, so we can get rid of these HVM params?

Obviously we'll need to handle incoming migration from domains one
release back, but after that we should be able to get rid of them,
right?

 -George

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-10 18:41   ` George Dunlap
@ 2014-03-11 10:41     ` Paul Durrant
  2014-03-11 10:52       ` Paul Durrant
  2014-03-11 16:48       ` George Dunlap
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-11 10:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 10 March 2014 18:41
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v2 5/6] ioreq-server: add support for
> multiple servers
> 
> On Tue, Mar 4, 2014 at 11:40 AM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > diff --git a/tools/libxc/xc_domain_restore.c
> b/tools/libxc/xc_domain_restore.c
> > index 1f6ce50..3116653 100644
> > --- a/tools/libxc/xc_domain_restore.c
> > +++ b/tools/libxc/xc_domain_restore.c
> > @@ -746,6 +746,7 @@ typedef struct {
> >      uint64_t acpi_ioport_location;
> >      uint64_t viridian;
> >      uint64_t vm_generationid_addr;
> > +    uint64_t nr_ioreq_servers;
> >
> >      struct toolstack_data_t tdata;
> >  } pagebuf_t;
> > @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch,
> struct restore_ctx *ctx,
> >          DPRINTF("read generation id buffer address");
> >          return pagebuf_get_one(xch, ctx, buf, fd, dom);
> >
> > +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> > +        /* Skip padding 4 bytes then read the acpi ioport location. */
> 
> This comment might be confusing. :-)
> 

Oops. Sorry about that.

> > +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> > +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> > +        {
> > +            PERROR("error reading the number of IOREQ servers");
> > +            return -1;
> > +        }
> > +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> > +
> >      default:
> >          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
> >              ERROR("Max batch size exceeded (%d). Giving up.", count);
> >
> 
> 
> > diff --git a/tools/libxc/xc_hvm_build_x86.c
> b/tools/libxc/xc_hvm_build_x86.c
> > index b65e702..6d6328a 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -45,7 +45,7 @@
> >  #define SPECIALPAGE_IDENT_PT 4
> >  #define SPECIALPAGE_CONSOLE  5
> >  #define SPECIALPAGE_IOREQ    6
> > -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
> needs 2 pages */
> > +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq
> server needs 2 pages */
> 
> "each ioreq server needs 2 pages"?
> 

That's the intent. The line is getting rather long though.

> >  #define special_pfn(x) (0xff000u - 1 - (x))
> >
> >  #define VGA_HOLE_SIZE (0x20)
> >
> 
> [snip]
> 
> > @@ -515,7 +521,9 @@ static int setup_guest(xc_interface *xch,
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> >                       special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> > +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
> 
> Similarly, special_pfn(SPECIALPAGE_IOREQ+max_emulators)?
> 

No, it is slightly confusing so is worthy of a comment (which I'll add in the next version). The pages are split into 2 sets. The first half are the synchronous ioreq pages, the second half are for buffered ioreqs. So this PFN needs to be the half-way point. 

> Although actually, are you planning to make it possible to add more
> emulators (above "max_emulators") dynamically after the VM is created
> -- maybe in a future series?
> 
> If not, and you're always going to be statically allocating a fixed
> number of emulators at the beginning, there's not actually a reason to
> change the direction that the special PFNs go at all.
> 

I was slightly paranoid about some of the PFNs moving if we ever did increase the number of reserved special pages. We've seen breakage in old Windows PV drivers when that happened. So, I thought better to change the arrangement once and then if we did want to add emulators during a migration (or save restore) we could do it without e.g. the store ring moving.

> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > +                     max_emulators);
> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> 
> [snip]
> 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 4fc46eb..cf9b67d 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1750,6 +1750,9 @@ skip_vfb:
> >
> >              b_info->u.hvm.vendor_device = d;
> >          }
> > +
> > +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> > +            b_info->u.hvm.max_emulators = l + 1;
> 
> Do we want to give this a more structured naming convention?
> 
> device_model_secondary_max?  device_model_secondary_emulators?
> 

It was just a name I chose. I'm happy to change it... perhaps device_model_max? (Defaults to 1). It's a bit shorter to type.

> Also, how are you planning on starting these secondary emulators?
> Would it make sense for libxl to start them, in which case it should
> be able to do its own counting?  Or are you envisioning starting /
> destroying secondary emulators as the guest is running?
> 

That's an open question at the moment. I coded this series such that the secondary emulator could be started after the VM and hotplugs its device. For some emulators (e.g. the one I'm working on to supply a console) it makes more sense for libxl to start it -but I see that as being additional to this series. I don't think we want to stipulate that libxl is the only way to kick of a secondary emulator.

> >      }
> >
> >      xlu_cfg_destroy(config);
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index fb2dd73..e8b73fa 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct
> hvm_ioreq_server *s, int id)
> >  bool_t hvm_io_pending(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -    ioreq_t *p;
> > +    struct list_head *entry;
> >
> > -    if ( !s )
> > -        return 0;
> > +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
> > +    {
> > +        struct hvm_ioreq_server *s = list_entry(entry,
> > +                                                struct hvm_ioreq_server,
> > +                                                list_entry);
> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> >
> > -    p = get_ioreq(s, v->vcpu_id);
> > -    return ( p->state != STATE_IOREQ_NONE );
> > +        p = get_ioreq(s, v->vcpu_id);
> > +        if ( p->state != STATE_IOREQ_NONE )
> > +            return 1;
> 
> Redundant calls to get_ioreq().

Hmm. Looks like a patch rebase went a bit wrong there. Good spot.

> 
> > +    }
> > +
> > +    return 0;
> >  }
> >
> >  static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> 
> [snip]
> 
> > +static int hvm_access_cf8(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> 
> I take it this is part of virtualizing the pci space?
>

Yes, that needs to be done once you have more than one emulator.
 
> This wasn't mentioned in the commit message; it seems like it probably
> should have been introduced in a separate patch.
> 

It's not actually needed until you have more than one emulator though so it doesn't really make sense to separate it. I'll amend the commit message to point out that, for secondary emulators, IO ranges and PCI devices need to be explicitly registered.

> > +{
> > +    struct vcpu *curr = current;
> > +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> > +    int rc;
> > +
> > +    BUG_ON(port < 0xcf8);
> > +    port -= 0xcf8;
> > +
> > +    spin_lock(&hd->pci_lock);
> > +
> > +    if ( dir == IOREQ_WRITE )
> > +    {
> > +        switch ( bytes )
> > +        {
> > +        case 4:
> > +            hd->pci_cf8 = *val;
> > +            break;
> > +
> > +        case 2:
> > +        {
> > +            uint32_t mask = 0xffff << (port * 8);
> > +            uint32_t subval = *val << (port * 8);
> > +
> > +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> > +                          (subval & mask);
> > +            break;
> > +        }
> > +
> > +        case 1:
> > +        {
> > +            uint32_t mask = 0xff << (port * 8);
> > +            uint32_t subval = *val << (port * 8);
> > +
> > +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> > +                          (subval & mask);
> > +            break;
> > +        }
> > +
> > +        default:
> > +            break;
> > +        }
> > +
> > +        /* We always need to fall through to the catch all emulator */
> > +        rc = X86EMUL_UNHANDLEABLE;
> > +    }
> > +    else
> > +    {
> > +        switch ( bytes )
> > +        {
> > +        case 4:
> > +            *val = hd->pci_cf8;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        case 2:
> > +            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        case 1:
> > +            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
> > +            rc = X86EMUL_OKAY;
> > +            break;
> > +
> > +        default:
> > +            rc = X86EMUL_UNHANDLEABLE;
> > +            break;
> > +        }
> > +    }
> > +
> > +    spin_unlock(&hd->pci_lock);
> > +
> > +    return rc;
> > +}
> > +
> >  static int handle_pvh_io(
> >      int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> >  {
> > @@ -618,39 +704,53 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
> >      }
> >  }
> >
> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +static int hvm_create_ioreq_server(struct domain *d, ioservid_t id,
> domid_t domid)
> >  {
> >      struct hvm_ioreq_server *s;
> >      unsigned long pfn;
> >      struct vcpu *v;
> >      int i, rc;
> >
> > +    if ( id >= d-
> >arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> 
> Hmm, so for a few patches we're just completely lockless?
> 

Yes. The lock is not needed until this point - the single on-demand emulator only appears, it never disappears. Secondary emulators can come and go.

> Regressions like that can wreak havoc on bisections.
> 
> [snip]
> 
> > @@ -1646,11 +2113,112 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > +static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
> > +
> > +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v,
> ioreq_t *p)
> > +{
> > +#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
> > +
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +    uint8_t type;
> > +    uint64_t addr;
> > +
> > +    if ( p->type == IOREQ_TYPE_PIO &&
> > +         (p->addr & ~3) == 0xcfc )
> > +    {
> > +        /* PCI config data cycle */
> > +        type = IOREQ_TYPE_PCI_CONFIG;
> > +
> > +        spin_lock(&d->arch.hvm_domain.pci_lock);
> > +        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
> > +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> > +    }
> > +    else
> > +    {
> > +        type = p->type;
> > +        addr = p->addr;
> > +    }
> > +
> > +    rcu_read_lock(&ioreq_server_rcu_lock);
> > +
> > +    switch ( type )
> > +    {
> > +    case IOREQ_TYPE_COPY:
> > +    case IOREQ_TYPE_PIO:
> > +    case IOREQ_TYPE_PCI_CONFIG:
> > +        break;
> > +    default:
> > +        goto done;
> > +    }
> > +
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        switch ( type )
> > +        {
> > +            case IOREQ_TYPE_COPY:
> > +            case IOREQ_TYPE_PIO: {
> > +                struct list_head *list;
> > +                struct hvm_io_range *x;
> > +
> > +                list = ( type == IOREQ_TYPE_COPY ) ?
> > +                    &s->mmio_range_list :
> > +                    &s->portio_range_list;
> > +
> > +                list_for_each_entry ( x,
> > +                                      list,
> > +                                      list_entry )
> > +                {
> > +                    if ( (addr >= x->start) && (addr <= x->end) )
> > +                        goto found;
> > +                }
> > +                break;
> > +            }
> > +            case IOREQ_TYPE_PCI_CONFIG: {
> > +                struct hvm_pcidev *x;
> > +
> > +                list_for_each_entry ( x,
> > +                                      &s->pcidev_list,
> > +                                      list_entry )
> > +                {
> > +                    if ( BDF(addr) == x->bdf ) {
> > +                        p->type = type;
> > +                        p->addr = addr;
> > +                        goto found;
> > +                    }
> > +                }
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +done:
> > +    /* The catch-all server has id 0 */
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server_list,
> > +                          list_entry )
> > +    {
> > +        if ( s->id == 0 )
> > +            goto found;
> > +    }
> 
> This is an awful lot of code to go through for every single IO,
> particularly if the common case is that there's only a single ioreq
> server.  Have you done any performance tests with this on a workload
> that has a high IO count?
> 
> I realize that the cost of going all the way to qemu and back will
> still dominate the time, but I can't help but think this might add up,
> and I wonder if having a fast-path for max_emulators=1 would make
> sense on some of these potentially hot paths would make sense.

No, I don't have such numbers. As you say, the cost of waking up QEMU will dominate massively. Optimizing for a single list entry sounds like a good idea though - I'll do that.

> 
> > @@ -4466,9 +5295,9 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  if ( a.value == DOMID_SELF )
> >                      a.value = curr_d->domain_id;
> >
> > -                rc = hvm_create_ioreq_server(d, a.value);
> > +                rc = hvm_create_ioreq_server(d, 0, a.value);
> >                  if ( rc == -EEXIST )
> > -                    rc = hvm_set_ioreq_server_domid(d, a.value);
> > +                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
> >                  break;
> 
> Is there a plan to deprecate this old way of creating ioreq_server 0
> at some point, so we can get rid of these HVM params?

There's no plan as such. Once this patch series is in though, we could modify QEMU to use the new API such that server 0 is never created. Then there'd need to be some sort of deprecation and eventual removal. It would take a while.

> 
> Obviously we'll need to handle incoming migration from domains one
> release back, but after that we should be able to get rid of them,
> right?

We'd probably want to support old versions of QEMU for a while right? The params (and the catch-all server) need to stick around as long as they do.

  Paul

> 
>  -George

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-11 10:41     ` Paul Durrant
@ 2014-03-11 10:52       ` Paul Durrant
  2014-03-11 16:48       ` George Dunlap
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-11 10:52 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
[snip]
> > Similarly, special_pfn(SPECIALPAGE_IOREQ+max_emulators)?
> >
> 
> No, it is slightly confusing so is worthy of a comment (which I'll add in the
> next version). The pages are split into 2 sets. The first half are the
> synchronous ioreq pages, the second half are for buffered ioreqs. So this
> PFN needs to be the half-way point.
> 

Sorry, I was getting confused. You're right. All the more reason for the comment ;-)

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-11 10:41     ` Paul Durrant
  2014-03-11 10:52       ` Paul Durrant
@ 2014-03-11 16:48       ` George Dunlap
  2014-03-11 17:32         ` Paul Durrant
  1 sibling, 1 reply; 20+ messages in thread
From: George Dunlap @ 2014-03-11 16:48 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel@lists.xen.org

On Tue, Mar 11, 2014 at 10:41 AM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>
>> > diff --git a/tools/libxc/xc_hvm_build_x86.c
>> b/tools/libxc/xc_hvm_build_x86.c
>> > index b65e702..6d6328a 100644
>> > --- a/tools/libxc/xc_hvm_build_x86.c
>> > +++ b/tools/libxc/xc_hvm_build_x86.c
>> > @@ -45,7 +45,7 @@
>> >  #define SPECIALPAGE_IDENT_PT 4
>> >  #define SPECIALPAGE_CONSOLE  5
>> >  #define SPECIALPAGE_IOREQ    6
>> > -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
>> needs 2 pages */
>> > +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq
>> server needs 2 pages */
>>
>> "each ioreq server needs 2 pages"?
>>
>
> That's the intent. The line is getting rather long though.


Wouldn't hurt to put it on a separate line, then.

>> Although actually, are you planning to make it possible to add more
>> emulators (above "max_emulators") dynamically after the VM is created
>> -- maybe in a future series?
>>
>> If not, and you're always going to be statically allocating a fixed
>> number of emulators at the beginning, there's not actually a reason to
>> change the direction that the special PFNs go at all.
>>
>
> I was slightly paranoid about some of the PFNs moving if we ever did increase the number of reserved special pages. We've seen breakage in old Windows PV drivers when that happened. So, I thought better to change the arrangement once and then if we did want to add emulators during a migration (or save restore) we could do it without e.g. the store ring moving.

So if you're afraid of implicit dependencies, one way to deal with it
is to try to avoid moving them at all; the other way is to move them
around all the time, to shake out implicit dependencies as early as
possible. :-)  (You could have XenRT tests, for instance, that set
max_emulators to {1,2,3,4}.)

>
>> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
>> > +                     max_emulators);
>> >
>> >      /*
>> >       * Identity-map page table is required for running with CR0.PG=0 when
>>
>> [snip]
>>
>> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> > index 4fc46eb..cf9b67d 100644
>> > --- a/tools/libxl/xl_cmdimpl.c
>> > +++ b/tools/libxl/xl_cmdimpl.c
>> > @@ -1750,6 +1750,9 @@ skip_vfb:
>> >
>> >              b_info->u.hvm.vendor_device = d;
>> >          }
>> > +
>> > +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
>> > +            b_info->u.hvm.max_emulators = l + 1;
>>
>> Do we want to give this a more structured naming convention?
>>
>> device_model_secondary_max?  device_model_secondary_emulators?
>>
>
> It was just a name I chose. I'm happy to change it... perhaps device_model_max? (Defaults to 1). It's a bit shorter to type.

"device_model_max" doesn't say what the "max" is.

>
>> Also, how are you planning on starting these secondary emulators?
>> Would it make sense for libxl to start them, in which case it should
>> be able to do its own counting?  Or are you envisioning starting /
>> destroying secondary emulators as the guest is running?
>>
>
> That's an open question at the moment. I coded this series such that the secondary emulator could be started after the VM and hotplugs its device. For some emulators (e.g. the one I'm working on to supply a console) it makes more sense for libxl to start it -but I see that as being additional to this series. I don't think we want to stipulate that libxl is the only way to kick of a secondary emulator.

Actually, I think in general we should always expect secondary
emulators to be started *through* libxl, so that we can have a
guaranteed interface; the question I meant to ask I guess was about
whether all emulators would be started *during domain creation* (in
which case libxl would know the number of emulators at the beginning),
or whether some might be started later (in which case libxl would not
necessarily know the number of emulators at the beginning).

The thing which is the simplest, and which keeps our options open, is
what you've done here -- to have an optional user config.  Then if we
add the ability to specify seconday emulators in the config file, we
can add "auto-counting" functionality at that time; and if we add
libxl functions to "hot-plug" secondary emulators, we'll need the user
config to make space.

Speaking of adding more emulators: After some more thinking, I'm not
sure that baking the layout of the ioreq and buf_ioreq pages into Xen,
the way the current patch series does, is a good idea.  At the moment,
you set HVM_PARAM_[BUF]IOREQ_PFN, and assume that all the emulators
will be contiguous.  Would it be better to introduce an interface to
allow arbitrary pages to be used for each ioreq server as it's created
(grandfathering in sid 0 to use the HVM_PARAMs)?  Then you wouldn't
need to mark off pfn space to use for ioreq servers during domain
creation at all.

 -George


>
>> >      }
>> >
>> >      xlu_cfg_destroy(config);
>> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> > index fb2dd73..e8b73fa 100644
>> > --- a/xen/arch/x86/hvm/hvm.c
>> > +++ b/xen/arch/x86/hvm/hvm.c
>> > @@ -357,14 +357,21 @@ static ioreq_t *get_ioreq(struct
>> hvm_ioreq_server *s, int id)
>> >  bool_t hvm_io_pending(struct vcpu *v)
>> >  {
>> >      struct domain *d = v->domain;
>> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
>> > -    ioreq_t *p;
>> > +    struct list_head *entry;
>> >
>> > -    if ( !s )
>> > -        return 0;
>> > +    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
>> > +    {
>> > +        struct hvm_ioreq_server *s = list_entry(entry,
>> > +                                                struct hvm_ioreq_server,
>> > +                                                list_entry);
>> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
>> >
>> > -    p = get_ioreq(s, v->vcpu_id);
>> > -    return ( p->state != STATE_IOREQ_NONE );
>> > +        p = get_ioreq(s, v->vcpu_id);
>> > +        if ( p->state != STATE_IOREQ_NONE )
>> > +            return 1;
>>
>> Redundant calls to get_ioreq().
>
> Hmm. Looks like a patch rebase went a bit wrong there. Good spot.
>
>>
>> > +    }
>> > +
>> > +    return 0;
>> >  }
>> >
>> >  static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>>
>> [snip]
>>
>> > +static int hvm_access_cf8(
>> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>>
>> I take it this is part of virtualizing the pci space?
>>
>
> Yes, that needs to be done once you have more than one emulator.
>
>> This wasn't mentioned in the commit message; it seems like it probably
>> should have been introduced in a separate patch.
>>
>
> It's not actually needed until you have more than one emulator though so it doesn't really make sense to separate it. I'll amend the commit message to point out that, for secondary emulators, IO ranges and PCI devices need to be explicitly registered.

>>
>> Obviously we'll need to handle incoming migration from domains one
>> release back, but after that we should be able to get rid of them,
>> right?
>
> We'd probably want to support old versions of QEMU for a while right? The params (and the catch-all server) need to stick around as long as they do.

Oh, right -- yeah, if we want to be able to run arbitrary versions of
qemu we'll need to keep it around for a while.

 -George

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] ioreq-server: add support for multiple servers
  2014-03-11 16:48       ` George Dunlap
@ 2014-03-11 17:32         ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2014-03-11 17:32 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
[snip]
> 
> Speaking of adding more emulators: After some more thinking, I'm not
> sure that baking the layout of the ioreq and buf_ioreq pages into Xen,
> the way the current patch series does, is a good idea.  At the moment,
> you set HVM_PARAM_[BUF]IOREQ_PFN, and assume that all the emulators
> will be contiguous.  Would it be better to introduce an interface to
> allow arbitrary pages to be used for each ioreq server as it's created
> (grandfathering in sid 0 to use the HVM_PARAMs)?  Then you wouldn't
> need to mark off pfn space to use for ioreq servers during domain
> creation at all.
> 

Well, there still needs be a suitable hole in guest pfn space. Would you be happy leaving the existing HVM params alone then and adding a new pair of params for base and range of a new pfn space (the range one replacing the max_emulators HVM param - since it'll be 2 * (max_emulators - 1))?

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-03-11 17:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04 11:40 [PATCH v2 0/6] Support for running secondary emulators Paul Durrant
2014-03-04 11:40 ` [PATCH v2 1/6] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-03-04 12:21   ` Jan Beulich
2014-03-04 17:25     ` Paul Durrant
2014-03-04 11:40 ` [PATCH v2 2/6] ioreq-server: tidy up use of ioreq_t Paul Durrant
2014-03-04 11:40 ` [PATCH v2 3/6] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-03-04 12:50   ` Jan Beulich
2014-03-04 11:40 ` [PATCH v2 4/6] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-03-04 13:02   ` Jan Beulich
2014-03-04 13:30     ` Paul Durrant
2014-03-04 15:43       ` Jan Beulich
2014-03-04 11:40 ` [PATCH v2 5/6] ioreq-server: add support for multiple servers Paul Durrant
2014-03-04 12:06   ` Andrew Cooper
2014-03-05 14:44     ` Paul Durrant
2014-03-10 18:41   ` George Dunlap
2014-03-11 10:41     ` Paul Durrant
2014-03-11 10:52       ` Paul Durrant
2014-03-11 16:48       ` George Dunlap
2014-03-11 17:32         ` Paul Durrant
2014-03-04 11:40 ` [PATCH v2 6/6] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).