[PATCH v2 0/2] Extend ioreq-server to support page write protection

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Extend ioreq-server to support page write protection
@ 2014-08-22 19:18 Wei Ye
  2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
  2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  0 siblings, 2 replies; 14+ messages in thread
From: Wei Ye @ 2014-08-22 19:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang, Wei Ye

Chnages from v1:
 - Changes the new p2m type name from p2m_ram_wp to p2m_mmio_write_dm. This
   means that we treat the pages as a special mmio range instead of ram.
 - Move macros to c file since only this file is using them.
 - Address various comments from Jan.

ioreq-server is proposed to forward PIO and MMIO request to multiple device
models according to the io range. XenGT (Intel Graphics Virtualization technology,
please refer to https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
xengt) driver reside in Dom0 as a virtual graphics device model also need to trap
and emulate the guest's write operation to some specific memory pages, like the
memory pages used by guest graphics driver as PPGTT(per-process graphics 
translation table). We add an new p2m type "p2m_mmio_write_dm" to trap the page
write operation. Page of this new p2m type is read only and for write, the request
will go to device model via ioreq-server.

Wei Ye (2):
  x86: add p2m_mmio_write_dm
  ioreq-server: Support scatter page forwarding

 tools/libxc/xc_domain.c          |   32 ++++++
 tools/libxc/xenctrl.h            |   18 ++++
 xen/arch/x86/hvm/hvm.c           |  212 +++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/mm/p2m-ept.c        |    1 +
 xen/arch/x86/mm/p2m-pt.c         |    1 +
 xen/include/asm-x86/hvm/domain.h |    9 ++
 xen/include/asm-x86/p2m.h        |    1 +
 xen/include/public/hvm/hvm_op.h  |   12 +++
 8 files changed, 285 insertions(+), 1 deletion(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 19:18 [PATCH v2 0/2] Extend ioreq-server to support page write protection Wei Ye
@ 2014-08-22 19:18 ` Wei Ye
  2014-08-22 10:13   ` Andrew Cooper
                     ` (2 more replies)
  2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  1 sibling, 3 replies; 14+ messages in thread
From: Wei Ye @ 2014-08-22 19:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang, Wei Ye

Add an new p2m type p2m_mmio_write_dm. Page of this type is read
only, and write will go to the device model for emulation just like
a mmio.

Signed-off-by: Wei Ye <wei.ye@intel.com>
---
 xen/arch/x86/hvm/hvm.c    |    3 ++-
 xen/arch/x86/mm/p2m-ept.c |    1 +
 xen/arch/x86/mm/p2m-pt.c  |    1 +
 xen/include/asm-x86/p2m.h |    1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 17ff011..4984149 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2739,7 +2739,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
      * to the mmio handler.
      */
     if ( (p2mt == p2m_mmio_dm) || 
-         (access_w && (p2mt == p2m_ram_ro)) )
+         (access_w && (p2mt == p2m_ram_ro)) ||
+         (access_w && (p2mt == p2m_mmio_write_dm)) )
     {
         put_gfn(p2m->domain, gfn);
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c6e83..f4c72d7 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -127,6 +127,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
         case p2m_ram_logdirty:
         case p2m_ram_ro:
         case p2m_ram_shared:
+        case p2m_mmio_write_dm:
             entry->r = entry->x = 1;
             entry->w = 0;
             break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 085ab6f..99b8b76 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -98,6 +98,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
     case p2m_ram_ro:
     case p2m_ram_logdirty:
     case p2m_ram_shared:
+    case p2m_mmio_write_dm:
         return flags | P2M_BASE_FLAGS;
     case p2m_ram_rw:
         return flags | P2M_BASE_FLAGS | _PAGE_RW;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 0ddbadb..523c7a9 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -71,6 +71,7 @@ typedef enum {
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
     p2m_map_foreign  = 14,        /* ram pages from foreign domain */
+    p2m_mmio_write_dm = 15,       /* Read-only; write go to the device model */
 } p2m_type_t;
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
@ 2014-08-22 10:13   ` Andrew Cooper
  2014-08-26  3:18     ` Ye, Wei
  2014-08-22 13:24   ` David Vrabel
  2014-08-25 10:46   ` Jan Beulich
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2014-08-22 10:13 UTC (permalink / raw)
  To: Wei Ye, xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang

On 22/08/14 20:18, Wei Ye wrote:
> Add an new p2m type p2m_mmio_write_dm. Page of this type is read
> only, and write will go to the device model for emulation just like
> a mmio.
>
> Signed-off-by: Wei Ye <wei.ye@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c    |    3 ++-
>  xen/arch/x86/mm/p2m-ept.c |    1 +
>  xen/arch/x86/mm/p2m-pt.c  |    1 +
>  xen/include/asm-x86/p2m.h |    1 +
>  4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 17ff011..4984149 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2739,7 +2739,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>       * to the mmio handler.
>       */
>      if ( (p2mt == p2m_mmio_dm) || 
> -         (access_w && (p2mt == p2m_ram_ro)) )
> +         (access_w && (p2mt == p2m_ram_ro)) ||
> +         (access_w && (p2mt == p2m_mmio_write_dm)) )

Please adjust the position of the logical or.

~Andrew

>      {
>          put_gfn(p2m->domain, gfn);
>  
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 15c6e83..f4c72d7 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -127,6 +127,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>          case p2m_ram_logdirty:
>          case p2m_ram_ro:
>          case p2m_ram_shared:
> +        case p2m_mmio_write_dm:
>              entry->r = entry->x = 1;
>              entry->w = 0;
>              break;
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index 085ab6f..99b8b76 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -98,6 +98,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
>      case p2m_ram_ro:
>      case p2m_ram_logdirty:
>      case p2m_ram_shared:
> +    case p2m_mmio_write_dm:
>          return flags | P2M_BASE_FLAGS;
>      case p2m_ram_rw:
>          return flags | P2M_BASE_FLAGS | _PAGE_RW;
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 0ddbadb..523c7a9 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -71,6 +71,7 @@ typedef enum {
>      p2m_ram_shared = 12,          /* Shared or sharable memory */
>      p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
>      p2m_map_foreign  = 14,        /* ram pages from foreign domain */
> +    p2m_mmio_write_dm = 15,       /* Read-only; write go to the device model */
>  } p2m_type_t;
>  
>  /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 10:13   ` Andrew Cooper
@ 2014-08-26  3:18     ` Ye, Wei
  0 siblings, 0 replies; 14+ messages in thread
From: Ye, Wei @ 2014-08-26  3:18 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xen.org
  Cc: Tian, Kevin, keir@xen.org, ian.campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, Dugger, Donald D,
	Paul.Durrant@citrix.com, Lv, Zhiyuan, JBeulich@suse.com,
	Zhang, Yang Z

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, August 22, 2014 6:13 PM
> To: Ye, Wei; xen-devel@lists.xen.org
> Cc: Tian, Kevin; keir@xen.org; ian.campbell@citrix.com;
> stefano.stabellini@eu.citrix.com; tim@xen.org; ian.jackson@eu.citrix.com;
> Dugger, Donald D; Paul.Durrant@citrix.com; Lv, Zhiyuan; JBeulich@suse.com;
> Zhang, Yang Z
> Subject: Re: [Xen-devel] [PATCH v2 1/2] x86: add p2m_mmio_write_dm
> 
> On 22/08/14 20:18, Wei Ye wrote:
> > Add an new p2m type p2m_mmio_write_dm. Page of this type is read only,
> > and write will go to the device model for emulation just like a mmio.
> >
> > Signed-off-by: Wei Ye <wei.ye@intel.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c    |    3 ++-
> >  xen/arch/x86/mm/p2m-ept.c |    1 +
> >  xen/arch/x86/mm/p2m-pt.c  |    1 +
> >  xen/include/asm-x86/p2m.h |    1 +
> >  4 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index
> > 17ff011..4984149 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2739,7 +2739,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >       * to the mmio handler.
> >       */
> >      if ( (p2mt == p2m_mmio_dm) ||
> > -         (access_w && (p2mt == p2m_ram_ro)) )
> > +         (access_w && (p2mt == p2m_ram_ro)) ||
> > +         (access_w && (p2mt == p2m_mmio_write_dm)) )
> 
> Please adjust the position of the logical or.
> 
Ok.

> ~Andrew
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
  2014-08-22 10:13   ` Andrew Cooper
@ 2014-08-22 13:24   ` David Vrabel
  2014-08-25 22:17     ` Tian, Kevin
  2014-08-25 10:46   ` Jan Beulich
  2 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2014-08-22 13:24 UTC (permalink / raw)
  To: Wei Ye, xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang

On 22/08/14 20:18, Wei Ye wrote:
> Add an new p2m type p2m_mmio_write_dm. Page of this type is read
> only, and write will go to the device model for emulation just like
> a mmio.

This type appears to be identical in behaviour to p2m_ram_ro?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 13:24   ` David Vrabel
@ 2014-08-25 22:17     ` Tian, Kevin
  0 siblings, 0 replies; 14+ messages in thread
From: Tian, Kevin @ 2014-08-25 22:17 UTC (permalink / raw)
  To: David Vrabel, Ye, Wei, xen-devel@lists.xen.org
  Cc: keir@xen.org, ian.campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, Dugger, Donald D,
	Paul.Durrant@citrix.com, Lv, Zhiyuan, JBeulich@suse.com,
	Zhang, Yang Z

> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Friday, August 22, 2014 6:24 AM
> 
> On 22/08/14 20:18, Wei Ye wrote:
> > Add an new p2m type p2m_mmio_write_dm. Page of this type is read
> > only, and write will go to the device model for emulation just like
> > a mmio.
> 
> This type appears to be identical in behaviour to p2m_ram_ro?
> 

no. p2m_ram_ro abandons writes, while p2m_mmio_write_dm emulates
writes.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
  2014-08-22 10:13   ` Andrew Cooper
  2014-08-22 13:24   ` David Vrabel
@ 2014-08-25 10:46   ` Jan Beulich
  2014-08-25 22:25     ` Tian, Kevin
  2 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2014-08-25 10:46 UTC (permalink / raw)
  To: Wei Ye
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, xen-devel, Paul.Durrant, zhiyuan.lv,
	yang.z.zhang

>>> On 22.08.14 at 21:18, <wei.ye@intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -127,6 +127,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
>          case p2m_ram_logdirty:
>          case p2m_ram_ro:
>          case p2m_ram_shared:
> +        case p2m_mmio_write_dm:
>              entry->r = entry->x = 1;
>              entry->w = 0;
>              break;

Is that really what you want? I.e. should such pages be executable?

> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -98,6 +98,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
>      case p2m_ram_ro:
>      case p2m_ram_logdirty:
>      case p2m_ram_shared:
> +    case p2m_mmio_write_dm:
>          return flags | P2M_BASE_FLAGS;

Same here - this would likely better be grouped with
p2m_grant_map_ro.

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -71,6 +71,7 @@ typedef enum {
>      p2m_ram_shared = 12,          /* Shared or sharable memory */
>      p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
>      p2m_map_foreign  = 14,        /* ram pages from foreign domain */
> +    p2m_mmio_write_dm = 15,       /* Read-only; write go to the device model */
>  } p2m_type_t;

I'm still not really convinced introducing a new p2m type is the right
approach here, but I'll leave the final decision to Tim.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86: add p2m_mmio_write_dm
  2014-08-25 10:46   ` Jan Beulich
@ 2014-08-25 22:25     ` Tian, Kevin
  0 siblings, 0 replies; 14+ messages in thread
From: Tian, Kevin @ 2014-08-25 22:25 UTC (permalink / raw)
  To: Jan Beulich, Ye, Wei
  Cc: keir@xen.org, ian.campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Paul.Durrant@citrix.com, Lv, Zhiyuan,
	Zhang, Yang Z

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, August 25, 2014 3:46 AM
> 
> >>> On 22.08.14 at 21:18, <wei.ye@intel.com> wrote:
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -127,6 +127,7 @@ static void ept_p2m_type_to_flags(ept_entry_t
> *entry, p2m_type_t type, p2m_acces
> >          case p2m_ram_logdirty:
> >          case p2m_ram_ro:
> >          case p2m_ram_shared:
> > +        case p2m_mmio_write_dm:
> >              entry->r = entry->x = 1;
> >              entry->w = 0;
> >              break;
> 
> Is that really what you want? I.e. should such pages be executable?
> 
> > --- a/xen/arch/x86/mm/p2m-pt.c
> > +++ b/xen/arch/x86/mm/p2m-pt.c
> > @@ -98,6 +98,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t,
> mfn_t mfn)
> >      case p2m_ram_ro:
> >      case p2m_ram_logdirty:
> >      case p2m_ram_shared:
> > +    case p2m_mmio_write_dm:
> >          return flags | P2M_BASE_FLAGS;
> 
> Same here - this would likely better be grouped with
> p2m_grant_map_ro.
> 
> > --- a/xen/include/asm-x86/p2m.h
> > +++ b/xen/include/asm-x86/p2m.h
> > @@ -71,6 +71,7 @@ typedef enum {
> >      p2m_ram_shared = 12,          /* Shared or sharable memory */
> >      p2m_ram_broken = 13,          /* Broken page, access cause
> domain crash */
> >      p2m_map_foreign  = 14,        /* ram pages from foreign
> domain */
> > +    p2m_mmio_write_dm = 15,       /* Read-only; write go to the
> device model */
> >  } p2m_type_t;
> 
> I'm still not really convinced introducing a new p2m type is the right
> approach here, but I'll leave the final decision to Tim.
> 

>From previous discussion Tim is OK with either approach, i.e. either adding
a new p2m type for plain read-only mapping, or treating it as a plain
mmio_dm. Given adding such a new type is straightforward, we want to
pursue new p2m type to implement real write-protection behavior.

Tim please correct me if you see other issues.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-22 19:18 [PATCH v2 0/2] Extend ioreq-server to support page write protection Wei Ye
  2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
@ 2014-08-22 19:18 ` Wei Ye
  2014-08-22 10:35   ` Andrew Cooper
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Wei Ye @ 2014-08-22 19:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang, Wei Ye

Extend the interface to support add/remove scatter page list to be
forwarded by a dedicated ioreq-server instance. Check and select
a right ioreq-server instance for forwarding the write operation
for a write protected page.

Signed-off-by: Wei Ye <wei.ye@intel.com>
---
 tools/libxc/xc_domain.c          |   32 ++++++
 tools/libxc/xenctrl.h            |   18 ++++
 xen/arch/x86/hvm/hvm.c           |  209 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |    9 ++
 xen/include/public/hvm/hvm_op.h  |   12 +++
 5 files changed, 280 insertions(+)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 37ed141..36e4e59 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1485,6 +1485,38 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
     return rc;
 }
 
+int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                     ioservid_t id, uint16_t set,
+                                     uint16_t nr_pages, uint64_t *gpfn)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t, arg);
+    int pg, rc = -1;
+
+    if ( arg == NULL )
+        return -1;
+    
+    if ( nr_pages > XEN_IOREQSVR_MAX_MAP_BATCH )
+        goto out;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pages_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->set = set;
+    arg->nr_pages = nr_pages;
+    for ( pg = 0; pg < nr_pages; pg++ )
+        arg->page_list[pg] = gpfn[pg];
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index b55d857..84e8465 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1943,6 +1943,24 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
                                           uint8_t function);
 
 /**
+ * This function registers/deregisters a set of pages to be write protected.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm set whether registers or deregisters: 1 for register, 0 for deregister
+ * @parm nr_pages the count of pages
+ * @parm pgfn the guest page frame number list
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t set,
+                                          uint16_t nr_pages,
+                                          uint64_t *gpfn);
+
+/**
  * This function destroys an IOREQ Server.
  *
  * @parm xch a handle to an open hypervisor interface.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4984149..bbbacc3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -68,6 +68,12 @@
 #include <public/mem_event.h>
 #include <xen/rangeset.h>
 
+#define PGLIST_HASH_SIZE_SHIFT    8
+#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
+#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
+#define PGLIST_INVALID_GPFN       0
+#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
+
 bool_t __read_mostly hvm_enabled;
 
 unsigned int opt_hvm_debug_level __read_mostly;
@@ -744,6 +750,98 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
+static struct pglist_hash_table *hvm_ioreq_server_lookup_pglist_hash_table(
+                          struct pglist_hash_table *pglist_ht, uint64_t gpfn)
+{
+    unsigned int index = pglist_hash(gpfn);
+    struct pglist_hash_table *entry;
+
+    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
+        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )
+            break;
+
+    return entry;
+}
+
+static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table *chain)
+{
+    struct pglist_hash_table *p = chain;
+    struct pglist_hash_table *n;
+
+    while ( p )
+    {
+        n = p->next;
+        xfree(p);
+        p = n;
+    }
+}
+
+static void hvm_ioreq_server_free_pglist_hash(struct pglist_hash_table *pglist_ht)
+{
+    unsigned int i;
+
+    for ( i = 0; i < PGLIST_HASH_SIZE; i++ )
+        if ( pglist_ht[i].next != NULL )
+            hvm_ioreq_server_free_hash_chain(pglist_ht[i].next);
+}
+
+static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table *pglist_ht,
+                                            uint64_t gpfn)
+{
+    unsigned int index = pglist_hash(gpfn);
+    struct pglist_hash_table *ne;
+
+    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) != NULL )
+        return -EINVAL;
+
+    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
+        pglist_ht[index].gpfn = gpfn;
+    else
+    {
+        ne = xmalloc(struct pglist_hash_table);
+        if ( !ne )
+            return -ENOMEM;
+        ne->next = pglist_ht[index].next;
+        ne->gpfn = gpfn;
+        pglist_ht[index].next = ne;
+    }
+
+    return 0;
+}
+
+static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table *pglist_ht,
+                                            uint64_t gpfn)
+{
+    unsigned int index = pglist_hash(gpfn);
+    struct pglist_hash_table *next, *prev;
+
+    if ( pglist_ht[index].gpfn == gpfn )
+        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
+    else
+    {
+        prev = &pglist_ht[index];
+        while ( 1 )
+        {
+            next = prev->next;
+            if ( !next )
+            {
+                printk(XENLOG_G_WARNING "hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
+                       " %lx not found\n", pglist_ht, gpfn);
+                return -EINVAL;
+            }
+            if ( next->gpfn == gpfn )
+            {
+                prev->next = next->next;
+                xfree(next);
+                break;
+            }
+            prev = next;
+         }
+    }
+
+    return 0;
+}
+
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
                                       bool_t is_default, bool_t handle_bufioreq)
 {
@@ -948,7 +1046,14 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     spin_lock_init(&s->lock);
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
+    spin_lock_init(&s->pglist_hash_lock);
 
+    rc = -ENOMEM;
+    s->pglist_hash_base = xmalloc_array(struct pglist_hash_table, PGLIST_HASH_SIZE);
+    if ( s->pglist_hash_base == NULL )
+        goto fail1;
+    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);
+ 
     rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
     if ( rc )
         goto fail1;
@@ -1087,6 +1192,9 @@ static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
         
         hvm_ioreq_server_deinit(s, 0);
 
+        hvm_ioreq_server_free_pglist_hash(s->pglist_hash_base);
+        xfree(s->pglist_hash_base);
+
         domain_unpause(d);
 
         xfree(s);
@@ -1279,6 +1387,63 @@ static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     return rc;
 }
 
+static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint16_t set, uint16_t nr_pages,
+                                         uint64_t *gpfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+    unsigned int i;
+    p2m_type_t ot, nt;
+
+    if ( set )
+    {
+        ot = p2m_ram_rw;
+        nt = p2m_mmio_write_dm;
+    }
+    else
+    {
+        ot = p2m_mmio_write_dm;
+        nt = p2m_ram_rw;
+    }
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s != d->arch.hvm_domain.default_ioreq_server &&
+             s->id == id )
+        {
+            spin_lock(&s->pglist_hash_lock);
+
+            for ( i = 0; i < nr_pages; i++ )
+            {
+                rc = p2m_change_type_one(d, gpfn[i], ot, nt);
+                if ( !rc )
+                {
+                    if ( set )
+                        rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base, gpfn[i]);
+                    else
+                        rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base, gpfn[i]);
+                }
+
+                if ( rc )
+                    break;
+            }
+
+            spin_unlock(&s->pglist_hash_lock);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
 static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
     struct hvm_ioreq_server *s;
@@ -2349,6 +2514,8 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         switch ( type )
         {
             unsigned long end;
+            uint64_t gpfn;
+            struct pglist_hash_table *he;
 
         case IOREQ_TYPE_PIO:
             end = addr + p->size - 1;
@@ -2361,6 +2528,14 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             if ( rangeset_contains_range(r, addr, end) )
                 return s;
 
+            gpfn = addr >> PAGE_SHIFT;
+            spin_lock(&s->pglist_hash_lock);
+            he = hvm_ioreq_server_lookup_pglist_hash_table(s->pglist_hash_base, gpfn);
+            spin_unlock(&s->pglist_hash_lock);
+
+            if ( he )
+                return s;
+
             break;
         case IOREQ_TYPE_PCI_CONFIG:
             if ( rangeset_contains_singleton(r, addr >> 32) )
@@ -5309,6 +5484,35 @@ static int hvmop_set_ioreq_server_state(
     return rc;
 }
 
+static int hvmop_map_pages_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pages_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pages_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d, HVMOP_map_pages_to_ioreq_server);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_map_pages_to_ioreq_server(d, op.id, op.set, op.nr_pages, op.page_list);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_destroy_ioreq_server(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
 {
@@ -5374,6 +5578,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
         break;
     
+    case HVMOP_map_pages_to_ioreq_server:
+        rc = hvmop_map_pages_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pages_to_ioreq_server_t));
+        break;
+ 
     case HVMOP_destroy_ioreq_server:
         rc = hvmop_destroy_ioreq_server(
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 291a2e0..c28fbbe 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -48,6 +48,11 @@ struct hvm_ioreq_vcpu {
     evtchn_port_t    ioreq_evtchn;
 };
 
+struct pglist_hash_table {
+    struct pglist_hash_table *next;
+    uint64_t gpfn;
+};
+
 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
 #define MAX_NR_IO_RANGES  256
 
@@ -70,6 +75,10 @@ struct hvm_ioreq_server {
     evtchn_port_t          bufioreq_evtchn;
     struct rangeset        *range[NR_IO_RANGE_TYPES];
     bool_t                 enabled;
+
+    /* scatter page list need write protection */
+    struct pglist_hash_table   *pglist_hash_base;
+    spinlock_t             pglist_hash_lock;
 };
 
 struct hvm_domain {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..f7c989a 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
 typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
+#define HVMOP_map_pages_to_ioreq_server 23
+#define XEN_IOREQSVR_MAX_MAP_BATCH      128
+struct xen_hvm_map_pages_to_ioreq_server {
+    domid_t domid;   /* IN - domain to be serviced */
+    ioservid_t id;   /* IN - server id */
+    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
+    uint16_t nr_pages;  /* IN - page count */
+    uint64_t page_list[XEN_IOREQSVR_MAX_MAP_BATCH]; /* IN - gpfn list */
+};
+typedef struct xen_hvm_map_pages_to_ioreq_server xen_hvm_map_pages_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pages_to_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
@ 2014-08-22 10:35   ` Andrew Cooper
  2014-08-26  8:40     ` Ye, Wei
  2014-08-26 11:35   ` Paul Durrant
  2014-08-26 13:26   ` Jan Beulich
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2014-08-22 10:35 UTC (permalink / raw)
  To: Wei Ye, xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, Paul.Durrant, zhiyuan.lv, JBeulich,
	yang.z.zhang

On 22/08/14 20:18, Wei Ye wrote:
> Extend the interface to support add/remove scatter page list to be
> forwarded by a dedicated ioreq-server instance. Check and select
> a right ioreq-server instance for forwarding the write operation
> for a write protected page.
>
> Signed-off-by: Wei Ye <wei.ye@intel.com>
> ---
>  tools/libxc/xc_domain.c          |   32 ++++++
>  tools/libxc/xenctrl.h            |   18 ++++
>  xen/arch/x86/hvm/hvm.c           |  209 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/domain.h |    9 ++
>  xen/include/public/hvm/hvm_op.h  |   12 +++
>  5 files changed, 280 insertions(+)
>
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 37ed141..36e4e59 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1485,6 +1485,38 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
>      return rc;
>  }
>  
> +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                     ioservid_t id, uint16_t set,
> +                                     uint16_t nr_pages, uint64_t *gpfn)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t, arg);
> +    int pg, rc = -1;
> +
> +    if ( arg == NULL )
> +        return -1;

You must set errno before exiting -1.

> +    
> +    if ( nr_pages > XEN_IOREQSVR_MAX_MAP_BATCH )
> +        goto out;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_pages_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->set = set;
> +    arg->nr_pages = nr_pages;
> +    for ( pg = 0; pg < nr_pages; pg++ )
> +        arg->page_list[pg] = gpfn[pg];

memcpy()

> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +out:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>                                  domid_t domid,
>                                  ioservid_t id)
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index b55d857..84e8465 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1943,6 +1943,24 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
>                                            uint8_t function);
>  
>  /**
> + * This function registers/deregisters a set of pages to be write protected.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm set whether registers or deregisters: 1 for register, 0 for deregister
> + * @parm nr_pages the count of pages
> + * @parm pgfn the guest page frame number list
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch,
> +                                          domid_t domid,
> +                                          ioservid_t id,
> +                                          uint16_t set,
> +                                          uint16_t nr_pages,
> +                                          uint64_t *gpfn);

Alignment of parameters.

> +
> +/**
>   * This function destroys an IOREQ Server.
>   *
>   * @parm xch a handle to an open hypervisor interface.
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4984149..bbbacc3 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -68,6 +68,12 @@
>  #include <public/mem_event.h>
>  #include <xen/rangeset.h>

This hash chain code belongs in its own patch.  Please split it out, to
allow the hypercall interface changes to be in a smaller patch.

>  
> +#define PGLIST_HASH_SIZE_SHIFT    8
> +#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
> +#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
> +#define PGLIST_INVALID_GPFN       0

Use INVALID_MFN.  GFN 0 is perfectly legitimate as the target of scatter
forwarding.

> +#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
> +
>  bool_t __read_mostly hvm_enabled;
>  
>  unsigned int opt_hvm_debug_level __read_mostly;
> @@ -744,6 +750,98 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>      spin_unlock(&s->lock);
>  }
>  
> +static struct pglist_hash_table *hvm_ioreq_server_lookup_pglist_hash_table(
> +                          struct pglist_hash_table *pglist_ht, uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *entry;
> +
> +    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
> +        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )
> +            break;
> +
> +    return entry;
> +}
> +
> +static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table *chain)
> +{
> +    struct pglist_hash_table *p = chain;
> +    struct pglist_hash_table *n;
> +
> +    while ( p )
> +    {
> +        n = p->next;
> +        xfree(p);
> +        p = n;
> +    }
> +}
> +
> +static void hvm_ioreq_server_free_pglist_hash(struct pglist_hash_table *pglist_ht)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < PGLIST_HASH_SIZE; i++ )
> +        if ( pglist_ht[i].next != NULL )
> +            hvm_ioreq_server_free_hash_chain(pglist_ht[i].next);
> +}
> +
> +static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table *pglist_ht,
> +                                            uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *ne;
> +
> +    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) != NULL )
> +        return -EINVAL;
> +
> +    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
> +        pglist_ht[index].gpfn = gpfn;
> +    else
> +    {
> +        ne = xmalloc(struct pglist_hash_table);
> +        if ( !ne )
> +            return -ENOMEM;
> +        ne->next = pglist_ht[index].next;
> +        ne->gpfn = gpfn;
> +        pglist_ht[index].next = ne;
> +    }
> +
> +    return 0;
> +}
> +
> +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table *pglist_ht,
> +                                            uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *next, *prev;
> +
> +    if ( pglist_ht[index].gpfn == gpfn )
> +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> +    else
> +    {
> +        prev = &pglist_ht[index];
> +        while ( 1 )
> +        {
> +            next = prev->next;
> +            if ( !next )
> +            {
> +                printk(XENLOG_G_WARNING "hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
> +                       " %lx not found\n", pglist_ht, gpfn);
> +                return -EINVAL;
> +            }
> +            if ( next->gpfn == gpfn )
> +            {
> +                prev->next = next->next;
> +                xfree(next);
> +                break;
> +            }
> +            prev = next;
> +         }
> +    }
> +
> +    return 0;
> +}
> +
>  static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
>                                        bool_t is_default, bool_t handle_bufioreq)
>  {
> @@ -948,7 +1046,14 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
>      spin_lock_init(&s->lock);
>      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
>      spin_lock_init(&s->bufioreq_lock);
> +    spin_lock_init(&s->pglist_hash_lock);
>  
> +    rc = -ENOMEM;
> +    s->pglist_hash_base = xmalloc_array(struct pglist_hash_table, PGLIST_HASH_SIZE);
> +    if ( s->pglist_hash_base == NULL )
> +        goto fail1;
> +    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);

xzalloc_array()

> + 
>      rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
>      if ( rc )
>          goto fail1;
> @@ -1087,6 +1192,9 @@ static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>          
>          hvm_ioreq_server_deinit(s, 0);
>  
> +        hvm_ioreq_server_free_pglist_hash(s->pglist_hash_base);
> +        xfree(s->pglist_hash_base);
> +
>          domain_unpause(d);
>  
>          xfree(s);
> @@ -1279,6 +1387,63 @@ static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                         uint16_t set, uint16_t nr_pages,
> +                                         uint64_t *gpfn)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +    unsigned int i;
> +    p2m_type_t ot, nt;
> +
> +    if ( set )
> +    {
> +        ot = p2m_ram_rw;
> +        nt = p2m_mmio_write_dm;
> +    }
> +    else
> +    {
> +        ot = p2m_mmio_write_dm;
> +        nt = p2m_ram_rw;
> +    }
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s != d->arch.hvm_domain.default_ioreq_server &&
> +             s->id == id )
> +        {
> +            spin_lock(&s->pglist_hash_lock);
> +
> +            for ( i = 0; i < nr_pages; i++ )
> +            {
> +                rc = p2m_change_type_one(d, gpfn[i], ot, nt);
> +                if ( !rc )
> +                {
> +                    if ( set )
> +                        rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base, gpfn[i]);
> +                    else
> +                        rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base, gpfn[i]);
> +                }
> +
> +                if ( rc )
> +                    break;
> +            }
> +
> +            spin_unlock(&s->pglist_hash_lock);
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    return rc;
> +}
> +
>  static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
>  {
>      struct hvm_ioreq_server *s;
> @@ -2349,6 +2514,8 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>          switch ( type )
>          {
>              unsigned long end;
> +            uint64_t gpfn;
> +            struct pglist_hash_table *he;
>  
>          case IOREQ_TYPE_PIO:
>              end = addr + p->size - 1;
> @@ -2361,6 +2528,14 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>              if ( rangeset_contains_range(r, addr, end) )
>                  return s;
>  
> +            gpfn = addr >> PAGE_SHIFT;
> +            spin_lock(&s->pglist_hash_lock);
> +            he = hvm_ioreq_server_lookup_pglist_hash_table(s->pglist_hash_base, gpfn);
> +            spin_unlock(&s->pglist_hash_lock);
> +
> +            if ( he )
> +                return s;
> +
>              break;
>          case IOREQ_TYPE_PCI_CONFIG:
>              if ( rangeset_contains_singleton(r, addr >> 32) )
> @@ -5309,6 +5484,35 @@ static int hvmop_set_ioreq_server_state(
>      return rc;
>  }
>  
> +static int hvmop_map_pages_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pages_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_pages_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d, HVMOP_map_pages_to_ioreq_server);
> +    if ( rc != 0 )
> +        goto out;
> +
> +    rc = hvm_map_pages_to_ioreq_server(d, op.id, op.set, op.nr_pages, op.page_list);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  static int hvmop_destroy_ioreq_server(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
>  {
> @@ -5374,6 +5578,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
>          break;
>      
> +    case HVMOP_map_pages_to_ioreq_server:
> +        rc = hvmop_map_pages_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_pages_to_ioreq_server_t));
> +        break;
> + 
>      case HVMOP_destroy_ioreq_server:
>          rc = hvmop_destroy_ioreq_server(
>              guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 291a2e0..c28fbbe 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -48,6 +48,11 @@ struct hvm_ioreq_vcpu {
>      evtchn_port_t    ioreq_evtchn;
>  };
>  
> +struct pglist_hash_table {
> +    struct pglist_hash_table *next;
> +    uint64_t gpfn;
> +};
> +
>  #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
>  #define MAX_NR_IO_RANGES  256
>  
> @@ -70,6 +75,10 @@ struct hvm_ioreq_server {
>      evtchn_port_t          bufioreq_evtchn;
>      struct rangeset        *range[NR_IO_RANGE_TYPES];
>      bool_t                 enabled;
> +
> +    /* scatter page list need write protection */
> +    struct pglist_hash_table   *pglist_hash_base;
> +    spinlock_t             pglist_hash_lock;
>  };
>  
>  struct hvm_domain {
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index eeb0a60..f7c989a 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>  
> +#define HVMOP_map_pages_to_ioreq_server 23
> +#define XEN_IOREQSVR_MAX_MAP_BATCH      128
> +struct xen_hvm_map_pages_to_ioreq_server {
> +    domid_t domid;   /* IN - domain to be serviced */
> +    ioservid_t id;   /* IN - server id */
> +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
> +    uint16_t nr_pages;  /* IN - page count */
> +    uint64_t page_list[XEN_IOREQSVR_MAX_MAP_BATCH]; /* IN - gpfn list */

No arbitrary limits like this in the API please.  Use a GUEST_HANDLE_64
instead.

~Andrew

> +};
> +typedef struct xen_hvm_map_pages_to_ioreq_server xen_hvm_map_pages_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pages_to_ioreq_server_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-22 10:35   ` Andrew Cooper
@ 2014-08-26  8:40     ` Ye, Wei
  2014-08-26  9:37       ` Andrew Cooper
  0 siblings, 1 reply; 14+ messages in thread
From: Ye, Wei @ 2014-08-26  8:40 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xen.org
  Cc: Tian, Kevin, keir@xen.org, ian.campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, Dugger, Donald D,
	Paul.Durrant@citrix.com, Lv, Zhiyuan, JBeulich@suse.com,
	Zhang, Yang Z



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, August 22, 2014 6:35 PM
> To: Ye, Wei; xen-devel@lists.xen.org
> Cc: Tian, Kevin; keir@xen.org; ian.campbell@citrix.com;
> stefano.stabellini@eu.citrix.com; tim@xen.org; ian.jackson@eu.citrix.com;
> Dugger, Donald D; Paul.Durrant@citrix.com; Lv, Zhiyuan; JBeulich@suse.com;
> Zhang, Yang Z
> Subject: Re: [Xen-devel] [PATCH v2 2/2] ioreq-server: Support scatter page
> forwarding
> 
> On 22/08/14 20:18, Wei Ye wrote:
> > Extend the interface to support add/remove scatter page list to be
> > forwarded by a dedicated ioreq-server instance. Check and select a
> > right ioreq-server instance for forwarding the write operation for a
> > write protected page.
> >
> > Signed-off-by: Wei Ye <wei.ye@intel.com>
> > ---
> >  tools/libxc/xc_domain.c          |   32 ++++++
> >  tools/libxc/xenctrl.h            |   18 ++++
> >  xen/arch/x86/hvm/hvm.c           |  209
> ++++++++++++++++++++++++++++++++++++++
> >  xen/include/asm-x86/hvm/domain.h |    9 ++
> >  xen/include/public/hvm/hvm_op.h  |   12 +++
> >  5 files changed, 280 insertions(+)
> >
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index
> > 37ed141..36e4e59 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1485,6 +1485,38 @@ int
> xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t
> domid,
> >      return rc;
> >  }
> >
> > +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> > +                                     ioservid_t id, uint16_t set,
> > +                                     uint16_t nr_pages, uint64_t
> > +*gpfn) {
> > +    DECLARE_HYPERCALL;
> > +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t,
> arg);
> > +    int pg, rc = -1;
> > +
> > +    if ( arg == NULL )
> > +        return -1;
> 
> You must set errno before exiting -1.
> 
I'm not sure what kind of errno shoud be set. I just follow the similar existing functions like
"xc_hvm_map_io_range_to_ioreq_server...", it also didn't set errno before exiting. What's
Your suggestion for the errno?

> > +
> > +    if ( nr_pages > XEN_IOREQSVR_MAX_MAP_BATCH )
> > +        goto out;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_map_pages_to_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->set = set;
> > +    arg->nr_pages = nr_pages;
> > +    for ( pg = 0; pg < nr_pages; pg++ )
> > +        arg->page_list[pg] = gpfn[pg];
> 
> memcpy()
> 
Ok.

> > +
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +
> > +out:
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> >  int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> >                                  domid_t domid,
> >                                  ioservid_t id) diff --git
> > a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index b55d857..84e8465
> > 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1943,6 +1943,24 @@ int
> xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> >                                            uint8_t function);
> >
> >  /**
> > + * This function registers/deregisters a set of pages to be write protected.
> > + *
> > + * @parm xch a handle to an open hypervisor interface.
> > + * @parm domid the domain id to be serviced
> > + * @parm id the IOREQ Server id.
> > + * @parm set whether registers or deregisters: 1 for register, 0 for
> > +deregister
> > + * @parm nr_pages the count of pages
> > + * @parm pgfn the guest page frame number list
> > + * @return 0 on success, -1 on failure.
> > + */
> > +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch,
> > +                                          domid_t domid,
> > +                                          ioservid_t id,
> > +                                          uint16_t set,
> > +                                          uint16_t nr_pages,
> > +                                          uint64_t *gpfn);
> 
> Alignment of parameters.
Ok.

> 
> > +
> > +/**
> >   * This function destroys an IOREQ Server.
> >   *
> >   * @parm xch a handle to an open hypervisor interface.
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index
> > 4984149..bbbacc3 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -68,6 +68,12 @@
> >  #include <public/mem_event.h>
> >  #include <xen/rangeset.h>
> 
> This hash chain code belongs in its own patch.  Please split it out, to allow the
> hypercall interface changes to be in a smaller patch.
> 
Will split the hash chain code into a single patch, thanks.

> >
> > +#define PGLIST_HASH_SIZE_SHIFT    8
> > +#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
> > +#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
> > +#define PGLIST_INVALID_GPFN       0
> 
> Use INVALID_MFN.  GFN 0 is perfectly legitimate as the target of scatter
> forwarding.
Ok.


> 
> > +#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
> > +
> >  bool_t __read_mostly hvm_enabled;
> >
> >  unsigned int opt_hvm_debug_level __read_mostly; @@ -744,6 +750,98
> @@
> > static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server
> *s)
> >      spin_unlock(&s->lock);
> >  }
> >
> > +static struct pglist_hash_table
> *hvm_ioreq_server_lookup_pglist_hash_table(
> > +                          struct pglist_hash_table *pglist_ht,
> > +uint64_t gpfn) {
> > +    unsigned int index = pglist_hash(gpfn);
> > +    struct pglist_hash_table *entry;
> > +
> > +    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
> > +        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )
> > +            break;
> > +
> > +    return entry;
> > +}
> > +
> > +static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table
> > +*chain) {
> > +    struct pglist_hash_table *p = chain;
> > +    struct pglist_hash_table *n;
> > +
> > +    while ( p )
> > +    {
> > +        n = p->next;
> > +        xfree(p);
> > +        p = n;
> > +    }
> > +}
> > +
> > +static void hvm_ioreq_server_free_pglist_hash(struct
> > +pglist_hash_table *pglist_ht) {
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < PGLIST_HASH_SIZE; i++ )
> > +        if ( pglist_ht[i].next != NULL )
> > +            hvm_ioreq_server_free_hash_chain(pglist_ht[i].next);
> > +}
> > +
> > +static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table
> *pglist_ht,
> > +                                            uint64_t gpfn) {
> > +    unsigned int index = pglist_hash(gpfn);
> > +    struct pglist_hash_table *ne;
> > +
> > +    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) !=
> NULL )
> > +        return -EINVAL;
> > +
> > +    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
> > +        pglist_ht[index].gpfn = gpfn;
> > +    else
> > +    {
> > +        ne = xmalloc(struct pglist_hash_table);
> > +        if ( !ne )
> > +            return -ENOMEM;
> > +        ne->next = pglist_ht[index].next;
> > +        ne->gpfn = gpfn;
> > +        pglist_ht[index].next = ne;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table
> *pglist_ht,
> > +                                            uint64_t gpfn) {
> > +    unsigned int index = pglist_hash(gpfn);
> > +    struct pglist_hash_table *next, *prev;
> > +
> > +    if ( pglist_ht[index].gpfn == gpfn )
> > +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> > +    else
> > +    {
> > +        prev = &pglist_ht[index];
> > +        while ( 1 )
> > +        {
> > +            next = prev->next;
> > +            if ( !next )
> > +            {
> > +                printk(XENLOG_G_WARNING
> "hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
> > +                       " %lx not found\n", pglist_ht, gpfn);
> > +                return -EINVAL;
> > +            }
> > +            if ( next->gpfn == gpfn )
> > +            {
> > +                prev->next = next->next;
> > +                xfree(next);
> > +                break;
> > +            }
> > +            prev = next;
> > +         }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
> >                                        bool_t is_default, bool_t
> > handle_bufioreq)  { @@ -948,7 +1046,14 @@ static int
> > hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
> >      spin_lock_init(&s->lock);
> >      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
> >      spin_lock_init(&s->bufioreq_lock);
> > +    spin_lock_init(&s->pglist_hash_lock);
> >
> > +    rc = -ENOMEM;
> > +    s->pglist_hash_base = xmalloc_array(struct pglist_hash_table,
> PGLIST_HASH_SIZE);
> > +    if ( s->pglist_hash_base == NULL )
> > +        goto fail1;
> > +    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE *
> > + PGLIST_HASH_SIZE);
> 
> xzalloc_array()
Ok.


> 
> > +
> >      rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
> >      if ( rc )
> >          goto fail1;
> > @@ -1087,6 +1192,9 @@ static int hvm_destroy_ioreq_server(struct
> > domain *d, ioservid_t id)
> >
> >          hvm_ioreq_server_deinit(s, 0);
> >
> > +        hvm_ioreq_server_free_pglist_hash(s->pglist_hash_base);
> > +        xfree(s->pglist_hash_base);
> > +
> >          domain_unpause(d);
> >
> >          xfree(s);
> > @@ -1279,6 +1387,63 @@ static int hvm_set_ioreq_server_state(struct
> domain *d, ioservid_t id,
> >      return rc;
> >  }
> >
> > +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t
> id,
> > +                                         uint16_t set, uint16_t nr_pages,
> > +                                         uint64_t *gpfn) {
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +    unsigned int i;
> > +    p2m_type_t ot, nt;
> > +
> > +    if ( set )
> > +    {
> > +        ot = p2m_ram_rw;
> > +        nt = p2m_mmio_write_dm;
> > +    }
> > +    else
> > +    {
> > +        ot = p2m_mmio_write_dm;
> > +        nt = p2m_ram_rw;
> > +    }
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s != d->arch.hvm_domain.default_ioreq_server &&
> > +             s->id == id )
> > +        {
> > +            spin_lock(&s->pglist_hash_lock);
> > +
> > +            for ( i = 0; i < nr_pages; i++ )
> > +            {
> > +                rc = p2m_change_type_one(d, gpfn[i], ot, nt);
> > +                if ( !rc )
> > +                {
> > +                    if ( set )
> > +                        rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base,
> gpfn[i]);
> > +                    else
> > +                        rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base,
> gpfn[i]);
> > +                }
> > +
> > +                if ( rc )
> > +                    break;
> > +            }
> > +
> > +            spin_unlock(&s->pglist_hash_lock);
> > +            break;
> > +        }
> > +    }
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    return rc;
> > +}
> > +
> >  static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct
> > vcpu *v)  {
> >      struct hvm_ioreq_server *s;
> > @@ -2349,6 +2514,8 @@ static struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
> >          switch ( type )
> >          {
> >              unsigned long end;
> > +            uint64_t gpfn;
> > +            struct pglist_hash_table *he;
> >
> >          case IOREQ_TYPE_PIO:
> >              end = addr + p->size - 1; @@ -2361,6 +2528,14 @@ static
> > struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> >              if ( rangeset_contains_range(r, addr, end) )
> >                  return s;
> >
> > +            gpfn = addr >> PAGE_SHIFT;
> > +            spin_lock(&s->pglist_hash_lock);
> > +            he = hvm_ioreq_server_lookup_pglist_hash_table(s-
> >pglist_hash_base, gpfn);
> > +            spin_unlock(&s->pglist_hash_lock);
> > +
> > +            if ( he )
> > +                return s;
> > +
> >              break;
> >          case IOREQ_TYPE_PCI_CONFIG:
> >              if ( rangeset_contains_singleton(r, addr >> 32) ) @@
> > -5309,6 +5484,35 @@ static int hvmop_set_ioreq_server_state(
> >      return rc;
> >  }
> >
> > +static int hvmop_map_pages_to_ioreq_server(
> > +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pages_to_ioreq_server_t)
> uop)
> > +{
> > +    xen_hvm_map_pages_to_ioreq_server_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d,
> HVMOP_map_pages_to_ioreq_server);
> > +    if ( rc != 0 )
> > +        goto out;
> > +
> > +    rc = hvm_map_pages_to_ioreq_server(d, op.id, op.set, op.nr_pages,
> > + op.page_list);
> > +
> > +out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> >  static int hvmop_destroy_ioreq_server(
> >      XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
> { @@
> > -5374,6 +5578,11 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >              guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
> >          break;
> >
> > +    case HVMOP_map_pages_to_ioreq_server:
> > +        rc = hvmop_map_pages_to_ioreq_server(
> > +            guest_handle_cast(arg, xen_hvm_map_pages_to_ioreq_server_t));
> > +        break;
> > +
> >      case HVMOP_destroy_ioreq_server:
> >          rc = hvmop_destroy_ioreq_server(
> >              guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> > diff --git a/xen/include/asm-x86/hvm/domain.h
> > b/xen/include/asm-x86/hvm/domain.h
> > index 291a2e0..c28fbbe 100644
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -48,6 +48,11 @@ struct hvm_ioreq_vcpu {
> >      evtchn_port_t    ioreq_evtchn;
> >  };
> >
> > +struct pglist_hash_table {
> > +    struct pglist_hash_table *next;
> > +    uint64_t gpfn;
> > +};
> > +
> >  #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)  #define
> > MAX_NR_IO_RANGES  256
> >
> > @@ -70,6 +75,10 @@ struct hvm_ioreq_server {
> >      evtchn_port_t          bufioreq_evtchn;
> >      struct rangeset        *range[NR_IO_RANGE_TYPES];
> >      bool_t                 enabled;
> > +
> > +    /* scatter page list need write protection */
> > +    struct pglist_hash_table   *pglist_hash_base;
> > +    spinlock_t             pglist_hash_lock;
> >  };
> >
> >  struct hvm_domain {
> > diff --git a/xen/include/public/hvm/hvm_op.h
> > b/xen/include/public/hvm/hvm_op.h index eeb0a60..f7c989a 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state
> {  typedef
> > struct xen_hvm_set_ioreq_server_state
> > xen_hvm_set_ioreq_server_state_t;
> > DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
> >
> > +#define HVMOP_map_pages_to_ioreq_server 23
> > +#define XEN_IOREQSVR_MAX_MAP_BATCH      128
> > +struct xen_hvm_map_pages_to_ioreq_server {
> > +    domid_t domid;   /* IN - domain to be serviced */
> > +    ioservid_t id;   /* IN - server id */
> > +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
> > +    uint16_t nr_pages;  /* IN - page count */
> > +    uint64_t page_list[XEN_IOREQSVR_MAX_MAP_BATCH]; /* IN - gpfn list
> > +*/
> 
> No arbitrary limits like this in the API please.  Use a GUEST_HANDLE_64
> instead.
Ok.


> 
> ~Andrew
> 
> > +};
> > +typedef struct xen_hvm_map_pages_to_ioreq_server
> > +xen_hvm_map_pages_to_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pages_to_ioreq_server_t);
> > +
> >  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
> >
> >  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-26  8:40     ` Ye, Wei
@ 2014-08-26  9:37       ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2014-08-26  9:37 UTC (permalink / raw)
  To: Ye, Wei, xen-devel@lists.xen.org
  Cc: Tian, Kevin, keir@xen.org, ian.campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, tim@xen.org,
	ian.jackson@eu.citrix.com, Dugger, Donald D,
	Paul.Durrant@citrix.com, Lv, Zhiyuan, JBeulich@suse.com,
	Zhang, Yang Z

On 26/08/14 09:40, Ye, Wei wrote:
>
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, August 22, 2014 6:35 PM
>> To: Ye, Wei; xen-devel@lists.xen.org
>> Cc: Tian, Kevin; keir@xen.org; ian.campbell@citrix.com;
>> stefano.stabellini@eu.citrix.com; tim@xen.org; ian.jackson@eu.citrix.com;
>> Dugger, Donald D; Paul.Durrant@citrix.com; Lv, Zhiyuan; JBeulich@suse.com;
>> Zhang, Yang Z
>> Subject: Re: [Xen-devel] [PATCH v2 2/2] ioreq-server: Support scatter page
>> forwarding
>>
>> On 22/08/14 20:18, Wei Ye wrote:
>>> Extend the interface to support add/remove scatter page list to be
>>> forwarded by a dedicated ioreq-server instance. Check and select a
>>> right ioreq-server instance for forwarding the write operation for a
>>> write protected page.
>>>
>>> Signed-off-by: Wei Ye <wei.ye@intel.com>
>>> ---
>>>  tools/libxc/xc_domain.c          |   32 ++++++
>>>  tools/libxc/xenctrl.h            |   18 ++++
>>>  xen/arch/x86/hvm/hvm.c           |  209
>> ++++++++++++++++++++++++++++++++++++++
>>>  xen/include/asm-x86/hvm/domain.h |    9 ++
>>>  xen/include/public/hvm/hvm_op.h  |   12 +++
>>>  5 files changed, 280 insertions(+)
>>>
>>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index
>>> 37ed141..36e4e59 100644
>>> --- a/tools/libxc/xc_domain.c
>>> +++ b/tools/libxc/xc_domain.c
>>> @@ -1485,6 +1485,38 @@ int
>> xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t
>> domid,
>>>      return rc;
>>>  }
>>>
>>> +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t
>> domid,
>>> +                                     ioservid_t id, uint16_t set,
>>> +                                     uint16_t nr_pages, uint64_t
>>> +*gpfn) {
>>> +    DECLARE_HYPERCALL;
>>> +
>> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t,
>> arg);
>>> +    int pg, rc = -1;
>>> +
>>> +    if ( arg == NULL )
>>> +        return -1;
>> You must set errno before exiting -1.
>>
> I'm not sure what kind of errno shoud be set. I just follow the similar existing functions like
> "xc_hvm_map_io_range_to_ioreq_server...", it also didn't set errno before exiting. What's
> Your suggestion for the errno?

The error handling/consistency in libxc is admittedly appalling, but new
code should not be making it any worse.

In this case, EFAULT might be acceptable, as NULL is a bad address to pass.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  2014-08-22 10:35   ` Andrew Cooper
@ 2014-08-26 11:35   ` Paul Durrant
  2014-08-26 13:26   ` Jan Beulich
  2 siblings, 0 replies; 14+ messages in thread
From: Paul Durrant @ 2014-08-26 11:35 UTC (permalink / raw)
  To: Wei Ye, xen-devel@lists.xen.org
  Cc: Kevin Tian, Keir (Xen.org), Ian Campbell, Tim (Xen.org),
	donald.d.dugger@intel.com, Stefano Stabellini,
	zhiyuan.lv@intel.com, JBeulich@suse.com, yang.z.zhang@intel.com,
	Ian Jackson

> -----Original Message-----
> From: Wei Ye [mailto:wei.ye@intel.com]
> Sent: 22 August 2014 20:19
> To: xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; Ian Campbell; Paul Durrant; Ian Jackson; Stefano
> Stabellini; donald.d.dugger@intel.com; Kevin Tian; yang.z.zhang@intel.com;
> zhiyuan.lv@intel.com; konrad.wilk@oracle.com; Keir (Xen.org); Tim
> (Xen.org); Wei Ye
> Subject: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
> 
> Extend the interface to support add/remove scatter page list to be
> forwarded by a dedicated ioreq-server instance. Check and select
> a right ioreq-server instance for forwarding the write operation
> for a write protected page.
> 
> Signed-off-by: Wei Ye <wei.ye@intel.com>
> ---
>  tools/libxc/xc_domain.c          |   32 ++++++
>  tools/libxc/xenctrl.h            |   18 ++++
>  xen/arch/x86/hvm/hvm.c           |  209
> ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/domain.h |    9 ++
>  xen/include/public/hvm/hvm_op.h  |   12 +++
>  5 files changed, 280 insertions(+)
> 
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 37ed141..36e4e59 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1485,6 +1485,38 @@ int
> xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t
> domid,
>      return rc;
>  }
> 
> +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t
> domid,
> +                                     ioservid_t id, uint16_t set,
> +                                     uint16_t nr_pages, uint64_t *gpfn)
> +{
> +    DECLARE_HYPERCALL;
> +
> DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t,
> arg);
> +    int pg, rc = -1;
> +
> +    if ( arg == NULL )
> +        return -1;
> +
> +    if ( nr_pages > XEN_IOREQSVR_MAX_MAP_BATCH )
> +        goto out;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_pages_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->set = set;
> +    arg->nr_pages = nr_pages;
> +    for ( pg = 0; pg < nr_pages; pg++ )
> +        arg->page_list[pg] = gpfn[pg];
> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +out:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>                                  domid_t domid,
>                                  ioservid_t id)
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index b55d857..84e8465 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1943,6 +1943,24 @@ int
> xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
>                                            uint8_t function);
> 
>  /**
> + * This function registers/deregisters a set of pages to be write protected.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm set whether registers or deregisters: 1 for register, 0 for
> deregister
> + * @parm nr_pages the count of pages
> + * @parm pgfn the guest page frame number list
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch,
> +                                          domid_t domid,
> +                                          ioservid_t id,
> +                                          uint16_t set,
> +                                          uint16_t nr_pages,
> +                                          uint64_t *gpfn);
> +
> +/**
>   * This function destroys an IOREQ Server.
>   *
>   * @parm xch a handle to an open hypervisor interface.
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4984149..bbbacc3 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -68,6 +68,12 @@
>  #include <public/mem_event.h>
>  #include <xen/rangeset.h>
> 
> +#define PGLIST_HASH_SIZE_SHIFT    8
> +#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
> +#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
> +#define PGLIST_INVALID_GPFN       0
> +#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
> +

Why are you persisting with the hash? I may have missed something as I was away on vacation, but Jan's query about why your v1 version of this patch was not using rangesets doesn't seem to have been addresses. I would have thought you could unify mapping of pages and mmio ranges fairly easily, which would probably massively reduce the size of this patch.

  Paul

>  bool_t __read_mostly hvm_enabled;
> 
>  unsigned int opt_hvm_debug_level __read_mostly;
> @@ -744,6 +750,98 @@ static void
> hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>      spin_unlock(&s->lock);
>  }
> 
> +static struct pglist_hash_table
> *hvm_ioreq_server_lookup_pglist_hash_table(
> +                          struct pglist_hash_table *pglist_ht, uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *entry;
> +
> +    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
> +        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )
> +            break;
> +
> +    return entry;
> +}
> +
> +static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table
> *chain)
> +{
> +    struct pglist_hash_table *p = chain;
> +    struct pglist_hash_table *n;
> +
> +    while ( p )
> +    {
> +        n = p->next;
> +        xfree(p);
> +        p = n;
> +    }
> +}
> +
> +static void hvm_ioreq_server_free_pglist_hash(struct pglist_hash_table
> *pglist_ht)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < PGLIST_HASH_SIZE; i++ )
> +        if ( pglist_ht[i].next != NULL )
> +            hvm_ioreq_server_free_hash_chain(pglist_ht[i].next);
> +}
> +
> +static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table
> *pglist_ht,
> +                                            uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *ne;
> +
> +    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) != NULL
> )
> +        return -EINVAL;
> +
> +    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
> +        pglist_ht[index].gpfn = gpfn;
> +    else
> +    {
> +        ne = xmalloc(struct pglist_hash_table);
> +        if ( !ne )
> +            return -ENOMEM;
> +        ne->next = pglist_ht[index].next;
> +        ne->gpfn = gpfn;
> +        pglist_ht[index].next = ne;
> +    }
> +
> +    return 0;
> +}
> +
> +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table
> *pglist_ht,
> +                                            uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *next, *prev;
> +
> +    if ( pglist_ht[index].gpfn == gpfn )
> +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> +    else
> +    {
> +        prev = &pglist_ht[index];
> +        while ( 1 )
> +        {
> +            next = prev->next;
> +            if ( !next )
> +            {
> +                printk(XENLOG_G_WARNING
> "hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
> +                       " %lx not found\n", pglist_ht, gpfn);
> +                return -EINVAL;
> +            }
> +            if ( next->gpfn == gpfn )
> +            {
> +                prev->next = next->next;
> +                xfree(next);
> +                break;
> +            }
> +            prev = next;
> +         }
> +    }
> +
> +    return 0;
> +}
> +
>  static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
>                                        bool_t is_default, bool_t handle_bufioreq)
>  {
> @@ -948,7 +1046,14 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s, struct domain *d,
>      spin_lock_init(&s->lock);
>      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
>      spin_lock_init(&s->bufioreq_lock);
> +    spin_lock_init(&s->pglist_hash_lock);
> 
> +    rc = -ENOMEM;
> +    s->pglist_hash_base = xmalloc_array(struct pglist_hash_table,
> PGLIST_HASH_SIZE);
> +    if ( s->pglist_hash_base == NULL )
> +        goto fail1;
> +    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE *
> PGLIST_HASH_SIZE);
> +
>      rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
>      if ( rc )
>          goto fail1;
> @@ -1087,6 +1192,9 @@ static int hvm_destroy_ioreq_server(struct domain
> *d, ioservid_t id)
> 
>          hvm_ioreq_server_deinit(s, 0);
> 
> +        hvm_ioreq_server_free_pglist_hash(s->pglist_hash_base);
> +        xfree(s->pglist_hash_base);
> +
>          domain_unpause(d);
> 
>          xfree(s);
> @@ -1279,6 +1387,63 @@ static int hvm_set_ioreq_server_state(struct
> domain *d, ioservid_t id,
>      return rc;
>  }
> 
> +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                         uint16_t set, uint16_t nr_pages,
> +                                         uint64_t *gpfn)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +    unsigned int i;
> +    p2m_type_t ot, nt;
> +
> +    if ( set )
> +    {
> +        ot = p2m_ram_rw;
> +        nt = p2m_mmio_write_dm;
> +    }
> +    else
> +    {
> +        ot = p2m_mmio_write_dm;
> +        nt = p2m_ram_rw;
> +    }
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s != d->arch.hvm_domain.default_ioreq_server &&
> +             s->id == id )
> +        {
> +            spin_lock(&s->pglist_hash_lock);
> +
> +            for ( i = 0; i < nr_pages; i++ )
> +            {
> +                rc = p2m_change_type_one(d, gpfn[i], ot, nt);
> +                if ( !rc )
> +                {
> +                    if ( set )
> +                        rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base,
> gpfn[i]);
> +                    else
> +                        rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base,
> gpfn[i]);
> +                }
> +
> +                if ( rc )
> +                    break;
> +            }
> +
> +            spin_unlock(&s->pglist_hash_lock);
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    return rc;
> +}
> +
>  static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu
> *v)
>  {
>      struct hvm_ioreq_server *s;
> @@ -2349,6 +2514,8 @@ static struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>          switch ( type )
>          {
>              unsigned long end;
> +            uint64_t gpfn;
> +            struct pglist_hash_table *he;
> 
>          case IOREQ_TYPE_PIO:
>              end = addr + p->size - 1;
> @@ -2361,6 +2528,14 @@ static struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>              if ( rangeset_contains_range(r, addr, end) )
>                  return s;
> 
> +            gpfn = addr >> PAGE_SHIFT;
> +            spin_lock(&s->pglist_hash_lock);
> +            he = hvm_ioreq_server_lookup_pglist_hash_table(s-
> >pglist_hash_base, gpfn);
> +            spin_unlock(&s->pglist_hash_lock);
> +
> +            if ( he )
> +                return s;
> +
>              break;
>          case IOREQ_TYPE_PCI_CONFIG:
>              if ( rangeset_contains_singleton(r, addr >> 32) )
> @@ -5309,6 +5484,35 @@ static int hvmop_set_ioreq_server_state(
>      return rc;
>  }
> 
> +static int hvmop_map_pages_to_ioreq_server(
> +
> XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pages_to_ioreq_server_t)
> uop)
> +{
> +    xen_hvm_map_pages_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d,
> HVMOP_map_pages_to_ioreq_server);
> +    if ( rc != 0 )
> +        goto out;
> +
> +    rc = hvm_map_pages_to_ioreq_server(d, op.id, op.set, op.nr_pages,
> op.page_list);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  static int hvmop_destroy_ioreq_server(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
>  {
> @@ -5374,6 +5578,11 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>              guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
>          break;
> 
> +    case HVMOP_map_pages_to_ioreq_server:
> +        rc = hvmop_map_pages_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_pages_to_ioreq_server_t));
> +        break;
> +
>      case HVMOP_destroy_ioreq_server:
>          rc = hvmop_destroy_ioreq_server(
>              guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index 291a2e0..c28fbbe 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -48,6 +48,11 @@ struct hvm_ioreq_vcpu {
>      evtchn_port_t    ioreq_evtchn;
>  };
> 
> +struct pglist_hash_table {
> +    struct pglist_hash_table *next;
> +    uint64_t gpfn;
> +};
> +
>  #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
>  #define MAX_NR_IO_RANGES  256
> 
> @@ -70,6 +75,10 @@ struct hvm_ioreq_server {
>      evtchn_port_t          bufioreq_evtchn;
>      struct rangeset        *range[NR_IO_RANGE_TYPES];
>      bool_t                 enabled;
> +
> +    /* scatter page list need write protection */
> +    struct pglist_hash_table   *pglist_hash_base;
> +    spinlock_t             pglist_hash_lock;
>  };
> 
>  struct hvm_domain {
> diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> index eeb0a60..f7c989a 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state
> xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
> 
> +#define HVMOP_map_pages_to_ioreq_server 23
> +#define XEN_IOREQSVR_MAX_MAP_BATCH      128
> +struct xen_hvm_map_pages_to_ioreq_server {
> +    domid_t domid;   /* IN - domain to be serviced */
> +    ioservid_t id;   /* IN - server id */
> +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
> +    uint16_t nr_pages;  /* IN - page count */
> +    uint64_t page_list[XEN_IOREQSVR_MAX_MAP_BATCH]; /* IN - gpfn list
> */
> +};
> +typedef struct xen_hvm_map_pages_to_ioreq_server
> xen_hvm_map_pages_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pages_to_ioreq_server_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
> 
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] ioreq-server: Support scatter page forwarding
  2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  2014-08-22 10:35   ` Andrew Cooper
  2014-08-26 11:35   ` Paul Durrant
@ 2014-08-26 13:26   ` Jan Beulich
  2 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2014-08-26 13:26 UTC (permalink / raw)
  To: Wei Ye
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, donald.d.dugger, xen-devel, Paul.Durrant, zhiyuan.lv,
	yang.z.zhang

>>> On 22.08.14 at 21:18, <wei.ye@intel.com> wrote:

Independent of my earlier and Paul's recent question regarding the
need for all this code, a couple of mechanical comments:

> @@ -744,6 +750,98 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>      spin_unlock(&s->lock);
>  }
>  
> +static struct pglist_hash_table *hvm_ioreq_server_lookup_pglist_hash_table(
> +                          struct pglist_hash_table *pglist_ht, uint64_t gpfn)
> +{
> +    unsigned int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *entry;
> +
> +    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
> +        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )

Please write efficient code: By making sure up front that gpfn isn't
PGLIST_INVALID_GPFN, the left side of the && (executed on every
loop iteration) can be dropped.

> +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                         uint16_t set, uint16_t nr_pages,

Pointless uses of uint16_t. The first one is bool_t, the second
unsigned int.

> +                                         uint64_t *gpfn)

And this one would conventionally by unsigned long * or xen_pfn_t *.

> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +    unsigned int i;
> +    p2m_type_t ot, nt;
> +
> +    if ( set )
> +    {
> +        ot = p2m_ram_rw;
> +        nt = p2m_mmio_write_dm;
> +    }
> +    else
> +    {
> +        ot = p2m_mmio_write_dm;
> +        nt = p2m_ram_rw;
> +    }
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s != d->arch.hvm_domain.default_ioreq_server &&
> +             s->id == id )
> +        {
> +            spin_lock(&s->pglist_hash_lock);
> +
> +            for ( i = 0; i < nr_pages; i++ )

Up to 64k iterations without preemption is already quite a lot. Did
you measure how much worst case input would take to process?
(Note that I raised this question before without you addressing it.)

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state 
> xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>  
> +#define HVMOP_map_pages_to_ioreq_server 23
> +#define XEN_IOREQSVR_MAX_MAP_BATCH      128
> +struct xen_hvm_map_pages_to_ioreq_server {
> +    domid_t domid;   /* IN - domain to be serviced */
> +    ioservid_t id;   /* IN - server id */
> +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */

The comment doesn't match the implementation.

> +    uint16_t nr_pages;  /* IN - page count */
> +    uint64_t page_list[XEN_IOREQSVR_MAX_MAP_BATCH]; /* IN - gpfn list */

Ah, so the iteration limit above really is XEN_IOREQSVR_MAX_MAP_BATCH.
That's fine then, except that you need to bound check the input
value before iterating.

However I wonder whether you really want to bake such a fixed
limit into the hypercall. Using a handle here and limiting the count
to a (currently) sane maximum in the implementation would then
allow for bumping the count in the future.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-26 13:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-22 19:18 [PATCH v2 0/2] Extend ioreq-server to support page write protection Wei Ye
2014-08-22 19:18 ` [PATCH v2 1/2] x86: add p2m_mmio_write_dm Wei Ye
2014-08-22 10:13   ` Andrew Cooper
2014-08-26  3:18     ` Ye, Wei
2014-08-22 13:24   ` David Vrabel
2014-08-25 22:17     ` Tian, Kevin
2014-08-25 10:46   ` Jan Beulich
2014-08-25 22:25     ` Tian, Kevin
2014-08-22 19:18 ` [PATCH v2 2/2] ioreq-server: Support scatter page forwarding Wei Ye
2014-08-22 10:35   ` Andrew Cooper
2014-08-26  8:40     ` Ye, Wei
2014-08-26  9:37       ` Andrew Cooper
2014-08-26 11:35   ` Paul Durrant
2014-08-26 13:26   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).