xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature
@ 2014-09-05  8:37 Chao Peng
  2014-09-05  8:37 ` [PATCH v15 01/11] multicall: add no preemption ability between two calls Chao Peng
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Changes from v14:
 - Address comments from Jan and Andrew, including:
   * Add non-preemption ability to multicall;
   * Build the resource batch operation on top of multicall;
   * Simplify pqos option.

Changes from v13:
 - Address comments from Jan and Andrew, including:
   * Support mixed resource types in one invocation;
   * Remove some unused fields(rmid_min/rmid_inuse);
   * Other minor changes and code clean up;

Changes from v12:
 - Address comments from Jan, like QoS feature setting when booting,
   avoid unbound memory allocation in Xen, put resource access
   hypercall in platform_hypercall.c to avoid creating new files, 
   specifically enumerate L3 cache size for CQM (instead of
   x86_cache_size), get random socket CPU in user space tools, and 
   also some coding styles. However the continue_hypercall_on_cpu()
   suggestion is not adopted in this version due to the potential
   issue in our usage case.
 - Add a white list to limit the capability for resource access from
   tool side.
 - Address comments from Ian on the xl/libxl/libxc side.

Changes from v11:
 - Turn off pqos and pqos_monitor in Xen command line by default.
 - Modify the original specific MSR access hypercall into a generic
   resource access hypercall. This hypercall could be used to access
   MSR, Port I/O, etc. Use platform_op to replace sysctl so that both
   dom0 kernel and userspace could use this hypercall.
 - Address various comments from Jan, Ian, Konrad, and Daniel.

Changes from v10:
 - Re-design and re-implement the whole logic. In this version,
   hypervisor provides basic mechanisms (like access MSRs) while all
   policies are put in user space.
   patch 1-3 provide a generic MSR hypercall for toolstack to access
   patch 4-9 implement the cache QoS monitoring feature

Changes from v9:
 - Revise the readonly mapping mechanism to share data between Xen and
   userspace. We create L3C buffer for each socket, share both buffer
   address MFNs and buffer MFNs to userspace.
 - Split the pqos.c into pqos.c and cqm.c for better code structure.
 - Show the total L3 cache size when issueing xl pqos-list cqm command.
 - Abstract a libxl_getcqminfo() function to fetch cqm data from Xen.
 - Several coding style fixes.

Changes from v8:
 - Address comments from Ian Campbell, including:
   * Modify the return handling for xc_sysctl();
   * Add man page items for platform QoS related commands.
   * Fix typo in commit message.

Changes from v7:
 - Address comments from Andrew Cooper, including:
   * Check CQM capability before allocating cpumask memory.
   * Move one function declaration into the correct patch.

Changes from v6:
 - Address comments from Jan Beulich, including:
   * Remove the unnecessary CPUID feature check.
   * Remove the unnecessary socket_cpu_map.
   * Spin_lock related changes, avoid spin_lock_irqsave().
   * Use readonly mapping to pass cqm data between Xen/Userspace,
     to avoid data copying.
   * Optimize RDMSR/WRMSR logic to avoid unnecessary calls.
   * Misc fixes including __read_mostly prefix, return value, etc.

Changes from v5:
 - Address comments from Dario Faggioli, including:
   * Define a new libxl_cqminfo structure to avoid reference of xc
     structure in libxl functions.
   * Use LOGE() instead of the LIBXL__LOG() functions.

Changes from v4:
 - When comparing xl cqm parameter, use strcmp instead of strncmp,
   otherwise, "xl pqos-attach cqmabcd domid" will be considered as
   a valid command line.
 - Address comments from Andrew Cooper, including:
   * Adjust the pqos parameter parsing function.
   * Modify the pqos related documentation.
   * Add a check for opt_cqm_max_rmid in initialization code.
   * Do not IPI CPU that is in same socket with current CPU.
 - Address comments from Dario Faggioli, including:
   * Fix an typo in export symbols.
   * Return correct libxl error code for qos related functions.
   * Abstract the error printing logic into a function.
 - Address comment from Daniel De Graaf, including:
   * Add return value in for pqos related check.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Modify the GPLv2 related file header, remove the address.

Changes from v3:
 - Use structure to better organize CQM related global variables.
 - Address comments from Andrew Cooper, including:
   * Remove the domain creation flag for CQM RMID allocation.
   * Adjust the boot parameter format, use custom_param().
   * Add documentation for the new added boot parameter.
   * Change QoS type flag to be uint64_t.
   * Initialize the per socket cpu bitmap in system boot time.
   * Remove get_cqm_avail() function.
   * Misc of format changes.
 - Address comment from Daniel De Graaf, including:
   * Use avc_current_has_perm() for XEN2__PQOS_OP that belongs to SECCLASS_XEN2.

Changes from v2:
 - Address comments from Andrew Cooper, including:
   * Merging tools stack changes into one patch.
   * Reduce the IPI number to one per socket.
   * Change structures for CQM data exchange between tools and Xen.
   * Misc of format/variable/function name changes.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Simplify the error printing logic.
   * Add xsm check for the new added hypercalls.

Changes from v1:
 - Address comments from Andrew Cooper, including:
   * Change function names, e.g., alloc_cqm_rmid(), system_supports_cqm(), etc.
   * Change some structure element order to save packing cost.
   * Correct some function's return value.
   * Some programming styles change.
   * ...

Future generations of Intel Xeon processor may offer monitoring capability in
each logical processor to measure specific quality-of-service metric,
for example, the Cache QoS Monitoring to get L3 cache occupancy.
Detailed information please refer to Intel SDM chapter 17.14.

Cache QoS Monitoring provides a layer of abstraction between applications and
logical processors through the use of Resource Monitoring IDs (RMIDs).
In Xen design, each guest in the system can be assigned an RMID independently,
while RMID=0 is reserved for monitoring domains that doesn't enable CQM service.
When any of the domain's vcpu is scheduled on a logical processor, the domain's
RMID will be activated by programming the value into one specific MSR, and when
the vcpu is scheduled out, a RMID=0 will be programmed into that MSR.
The Cache QoS Hardware tracks cache utilization of memory accesses according to
the RMIDs and reports monitored data via a counter register. With this solution,
we can get the knowledge how much L3 cache is used by a certain guest.

To attach QoS monitoring service to a certain guest:
xl pqos-monitor-attach domid

To detached CQM service from a guest:
xl pqos-monitor-detach domid

To get the L3 cache usage:
$ xl pqos-monitor-show cache_occupancy <domid>

The below data is just an example showing how the CQM related data is exposed to
end user.

[root@localhost]# xl pqos-monitor-show cache_occupancy
Total RMID: 55
Per-Socket L3 Cache Size: 35840 KB
Name                                        ID        Socket 0        Socket 1
Domain-0                                     0        20720 KB        15960 KB
ExampleHVMDomain                             1         4200 KB         2352 KB

Chao Peng (11):
  multicall: add no preemption ability between two calls
  x86: add generic resource (e.g. MSR) access hypercall
  xsm: add resource operation related xsm policy
  tools: provide interface for generic resource access
  x86: detect and initialize Platform QoS Monitoring feature
  x86: dynamically attach/detach QoS monitoring service for a guest
  x86: collect global QoS monitoring information
  x86: enable QoS monitoring for each domain RMID
  x86: add QoS monitoring related MSRs in allowed list
  xsm: add platform QoS related xsm policies
  tools: CMDs and APIs for Platform QoS Monitoring

 docs/man/xl.pod.1                            |   24 +++
 docs/misc/xen-command-line.markdown          |   14 ++
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    6 +-
 tools/libxc/Makefile                         |    2 +
 tools/libxc/xc_msr_x86.h                     |   36 +++++
 tools/libxc/xc_pqos.c                        |  210 ++++++++++++++++++++++++++
 tools/libxc/xc_private.h                     |   21 +++
 tools/libxc/xc_resource.c                    |   92 +++++++++++
 tools/libxc/xenctrl.h                        |   29 ++++
 tools/libxl/Makefile                         |    2 +-
 tools/libxl/libxl.h                          |   19 +++
 tools/libxl/libxl_pqos.c                     |  184 ++++++++++++++++++++++
 tools/libxl/libxl_types.idl                  |    4 +
 tools/libxl/libxl_utils.c                    |   28 ++++
 tools/libxl/xl.h                             |    3 +
 tools/libxl/xl_cmdimpl.c                     |  131 ++++++++++++++++
 tools/libxl/xl_cmdtable.c                    |   17 +++
 xen/arch/x86/Makefile                        |    1 +
 xen/arch/x86/cpu/intel_cacheinfo.c           |   49 +-----
 xen/arch/x86/domain.c                        |    8 +
 xen/arch/x86/domctl.c                        |   29 ++++
 xen/arch/x86/platform_hypercall.c            |   70 +++++++++
 xen/arch/x86/pqos.c                          |  181 ++++++++++++++++++++++
 xen/arch/x86/setup.c                         |    3 +
 xen/arch/x86/sysctl.c                        |   43 ++++++
 xen/common/multicall.c                       |    5 +-
 xen/include/asm-x86/cpufeature.h             |   46 ++++++
 xen/include/asm-x86/domain.h                 |    2 +
 xen/include/asm-x86/msr-index.h              |    5 +
 xen/include/asm-x86/pqos.h                   |   63 ++++++++
 xen/include/public/domctl.h                  |   12 ++
 xen/include/public/platform.h                |   15 ++
 xen/include/public/sysctl.h                  |   14 ++
 xen/include/public/xen.h                     |    4 +
 xen/include/xlat.lst                         |    1 +
 xen/xsm/flask/hooks.c                        |   11 ++
 xen/xsm/flask/policy/access_vectors          |   18 ++-
 xen/xsm/flask/policy/security_classes        |    1 +
 39 files changed, 1351 insertions(+), 54 deletions(-)
 create mode 100644 tools/libxc/xc_msr_x86.h
 create mode 100644 tools/libxc/xc_pqos.c
 create mode 100644 tools/libxc/xc_resource.c
 create mode 100644 tools/libxl/libxl_pqos.c
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05 10:46   ` Andrew Cooper
  2014-09-05  8:37 ` [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall Chao Peng
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add a flag to indicate if the execution can be preempted between two
calls. If not specified, stay preemptable.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 xen/common/multicall.c   |    5 ++++-
 xen/include/public/xen.h |    4 ++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/common/multicall.c b/xen/common/multicall.c
index fa9d910..83b96eb 100644
--- a/xen/common/multicall.c
+++ b/xen/common/multicall.c
@@ -40,6 +40,7 @@ do_multicall(
     struct mc_state *mcs = &current->mc_state;
     uint32_t         i;
     int              rc = 0;
+    bool_t           preemptable = 0;
 
     if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
     {
@@ -52,7 +53,7 @@ do_multicall(
 
     for ( i = 0; !rc && i < nr_calls; i++ )
     {
-        if ( i && hypercall_preempt_check() )
+        if ( preemptable && hypercall_preempt_check() )
             goto preempted;
 
         if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
@@ -61,6 +62,8 @@ do_multicall(
             break;
         }
 
+        preemptable = mcs->call.flags & MC_NO_PREEMPT;
+
         trace_multicall_call(&mcs->call);
 
         do_multicall_call(&mcs->call);
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index a6a2092..3f8b908 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -550,10 +550,14 @@ DEFINE_XEN_GUEST_HANDLE(mmu_update_t);
 struct multicall_entry {
     xen_ulong_t op, result;
     xen_ulong_t args[6];
+    uint32_t flags;
 };
 typedef struct multicall_entry multicall_entry_t;
 DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
 
+/* These flags passed in the 'flags' field of multicall_entry_t. */
+#define MC_NO_PREEMPT    (1<<0)  /* Can't be preempted before next entry? */
+
 #if __XEN_INTERFACE_VERSION__ < 0x00040400
 /*
  * Event channel endpoints per domain (when using the 2-level ABI):
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
  2014-09-05  8:37 ` [PATCH v15 01/11] multicall: add no preemption ability between two calls Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05 10:59   ` Andrew Cooper
  2014-09-05  8:37 ` [PATCH v15 03/11] xsm: add resource operation related xsm policy Chao Peng
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add a generic resource access hypercall for tool stack or other
components, e.g., accessing MSR, port I/O, etc.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 xen/arch/x86/platform_hypercall.c |   63 +++++++++++++++++++++++++++++++++++++
 xen/include/public/platform.h     |   15 +++++++++
 xen/include/xlat.lst              |    1 +
 3 files changed, 79 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 2162811..e5ad4c9 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -61,6 +61,42 @@ long cpu_down_helper(void *data);
 long core_parking_helper(void *data);
 uint32_t get_cur_idle_nums(void);
 
+struct xen_resource_access {
+    int32_t ret;
+    struct xenpf_resource_op *data;
+};
+
+static bool_t allow_access_msr(unsigned int msr)
+{
+    return 0;
+}
+
+static void resource_access_one(void *info)
+{
+    struct xen_resource_access *ra = info;
+    int ret = 0;
+
+    switch ( ra->data->cmd )
+    {
+    case XEN_RESOURCE_OP_MSR_READ:
+    case XEN_RESOURCE_OP_MSR_WRITE:
+        if ( ra->data->idx >> 32 )
+            ret = -EINVAL;
+        else if ( !allow_access_msr(ra->data->idx) )
+            ret = -EACCES;
+        else if ( ra->data->cmd == XEN_RESOURCE_OP_MSR_READ )
+            ret = rdmsr_safe(ra->data->idx, ra->data->val);
+        else
+            ret = wrmsr_safe(ra->data->idx, ra->data->val);
+        break;
+    default:
+        ret = -EINVAL;
+        break;
+    }
+
+    ra->ret = ret;
+}
+
 ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
 {
     ret_t ret = 0;
@@ -601,6 +637,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_resource_op:
+    {
+        struct xen_resource_access ra;
+        struct xenpf_resource_op *rsc_op = &op->u.resource_op;
+        unsigned int cpu = smp_processor_id();
+
+        ra.data = rsc_op;
+
+        if ( rsc_op->cpu == cpu )
+            resource_access_one(&ra);
+        else if ( cpu_online(rsc_op->cpu) )
+            on_selected_cpus(cpumask_of(rsc_op->cpu),
+                         resource_access_one, &ra, 1);
+        else
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        if ( ra.ret )
+        {
+            ret = ra.ret;
+            break;
+        }
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 053b9fa..3ed50de 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,20 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_resource_op   61
+
+#define XEN_RESOURCE_OP_MSR_READ  0
+#define XEN_RESOURCE_OP_MSR_WRITE 1
+
+struct xenpf_resource_op {
+    uint16_t cmd;       /* XEN_RESOURCE_OP_* */
+    uint32_t cpu;
+    uint64_t idx;
+    uint64_t val;
+};
+typedef struct xenpf_resource_op xenpf_resource_op_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +567,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_resource_op       resource_op;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9a35dd7..06ed7b9 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -88,6 +88,7 @@
 ?	xenpf_enter_acpi_sleep		platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
+?	xenpf_resource_op		platform.h
 !	sched_poll			sched.h
 ?	sched_remote_shutdown		sched.h
 ?	sched_shutdown			sched.h
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 03/11] xsm: add resource operation related xsm policy
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
  2014-09-05  8:37 ` [PATCH v15 01/11] multicall: add no preemption ability between two calls Chao Peng
  2014-09-05  8:37 ` [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 04/11] tools: provide interface for generic resource access Chao Peng
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add xsm policies for resource access related hypercall, such as MSR
access, port I/O read/write, and other related resource operations.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Daniel De Graaf  <dgdegra@tycho.nsa.gov>
---
 tools/flask/policy/policy/modules/xen/xen.te |    3 +++
 xen/xsm/flask/hooks.c                        |    4 ++++
 xen/xsm/flask/policy/access_vectors          |   14 +++++++++++---
 xen/xsm/flask/policy/security_classes        |    1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bb59fe8..562b8df 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
 	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
 	tmem_control getscheduler setscheduler
 };
+allow dom0_t xen_t:xen2 {
+    resource_op
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index f2f59ea..fcfed25 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1383,6 +1383,10 @@ static int flask_platform_op(uint32_t op)
     case XENPF_get_cpuinfo:
         return domain_has_xen(current->domain, XEN__GETCPUINFO);
 
+    case XENPF_resource_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__RESOURCE_OP, NULL);
+
     default:
         printk("flask_platform_op: Unknown op %d\n", op);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 32371a9..b606441 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -3,9 +3,9 @@
 #
 # class class_name { permission_name ... }
 
-# Class xen consists of dom0-only operations dealing with the hypervisor itself.
-# Unless otherwise specified, the source is the domain executing the hypercall,
-# and the target is the xen initial sid (type xen_t).
+# Class xen and xen2 consists of dom0-only operations dealing with the
+# hypervisor itself. Unless otherwise specified, the source is the domain
+# executing the hypercall, and the target is the xen initial sid (type xen_t).
 class xen
 {
 # XENPF_settime
@@ -75,6 +75,14 @@ class xen
     setscheduler
 }
 
+# This is a continuation of class xen, since only 32 permissions can be
+# defined per class
+class xen2
+{
+# XENPF_resource_op
+    resource_op
+}
+
 # Classes domain and domain2 consist of operations that a domain performs on
 # another domain or on itself.  Unless otherwise specified, the source is the
 # domain executing the hypercall, and the target is the domain being operated on
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ef134a7..ca191db 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -8,6 +8,7 @@
 # for userspace object managers
 
 class xen
+class xen2
 class domain
 class domain2
 class hvm
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 04/11] tools: provide interface for generic resource access
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (2 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 03/11] xsm: add resource operation related xsm policy Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature Chao Peng
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Xen added a new platform_op hypercall for generic MSR access, and this
is the the tool side change to wrapper the hypercall into xc APIs.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/Makefile      |    1 +
 tools/libxc/xc_private.h  |   21 +++++++++++
 tools/libxc/xc_resource.c |   92 +++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h     |   12 ++++++
 4 files changed, 126 insertions(+)
 create mode 100644 tools/libxc/xc_resource.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 3b04027..dde6109 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -34,6 +34,7 @@ CTRL_SRCS-y       += xc_foreign_memory.c
 CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
+CTRL_SRCS-y       += xc_resource.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
 CTRL_SRCS-$(CONFIG_Linux) += xc_linux.c xc_linux_osdep.c
 CTRL_SRCS-$(CONFIG_FreeBSD) += xc_freebsd.c xc_freebsd_osdep.c
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index c50a7c9..3b98c57 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -310,6 +310,27 @@ static inline int do_sysctl(xc_interface *xch, struct xen_sysctl *sysctl)
     return ret;
 }
 
+static inline int do_multicall_op(xc_interface *xch,
+                                  xc_hypercall_buffer_t *call_list,
+                                  uint32_t nr_calls)
+{
+    int ret = -1;
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER_ARGUMENT(call_list);
+
+    hypercall.op     = __HYPERVISOR_multicall;
+    hypercall.arg[0] = HYPERCALL_BUFFER_AS_ARG(call_list);
+    hypercall.arg[1] = nr_calls;
+    if ( (ret = do_xen_hypercall(xch, &hypercall)) < 0 )
+    {
+        if ( errno == EACCES )
+            DPRINTF("multicall operation failed -- need to"
+                    " rebuild the user-space tool set?\n");
+    }
+
+    return ret;
+}
+
 int do_memory_op(xc_interface *xch, int cmd, void *arg, size_t len);
 
 void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom,
diff --git a/tools/libxc/xc_resource.c b/tools/libxc/xc_resource.c
new file mode 100644
index 0000000..f6d082c
--- /dev/null
+++ b/tools/libxc/xc_resource.c
@@ -0,0 +1,92 @@
+/*
+ * xc_resource.c
+ *
+ * Generic resource access API
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "xc_private.h"
+
+int xc_resource_op(xc_interface *xch, uint32_t nr, xc_resource_data_t *data)
+{
+    int rc, i;
+    xc_resource_data_t *rsc_data;
+    multicall_entry_t *call;
+    DECLARE_HYPERCALL_BUFFER(multicall_entry_t, call_list);
+    DECLARE_HYPERCALL_BUFFER(struct xen_platform_op, platform_op);
+    xc_hypercall_buffer_array_t *platform_op_list;
+
+    call_list = xc_hypercall_buffer_alloc(xch, call_list, sizeof(*call_list));
+    if ( !call_list )
+    {
+        return -1;
+    }
+
+    platform_op_list = xc_hypercall_buffer_array_create(xch, nr);
+    if ( !platform_op_list ) {
+        rc = -1;
+        goto out;
+    }
+
+    for ( i = 0; i < nr; i++ ) {
+        platform_op = xc_hypercall_buffer_array_alloc(xch, platform_op_list, i,
+                        platform_op, sizeof(struct xen_platform_op));
+        if ( !platform_op ) {
+            rc = -1;
+            goto out;
+        }
+        rsc_data = data + i;
+        call = call_list + i;
+
+        call->op = __HYPERVISOR_platform_op;
+        call->args[0] = HYPERCALL_BUFFER_AS_ARG(platform_op);
+        if ( rsc_data->flags & MC_NO_PREEMPT )
+            call->flags = MC_NO_PREEMPT;
+
+        platform_op->interface_version = XENPF_INTERFACE_VERSION;
+        platform_op->cmd = XENPF_resource_op;
+        platform_op->u.resource_op.cmd = rsc_data->rsc_op.cmd;
+        platform_op->u.resource_op.cpu = rsc_data->rsc_op.cpu;
+        platform_op->u.resource_op.idx = rsc_data->rsc_op.idx;
+        platform_op->u.resource_op.val = rsc_data->rsc_op.val;
+    }
+
+    rc = do_multicall_op(xch, HYPERCALL_BUFFER(call_list), nr);
+
+    for ( i = 0; i < nr; i++ ) {
+        xc_hypercall_buffer_array_get(xch, platform_op_list, i,
+                        platform_op, sizeof(struct xen_platform_op));
+        rsc_data = data + i;
+        call = call_list + i;
+
+        rsc_data->result = call->result;
+        rsc_data->rsc_op.val = platform_op->u.resource_op.val;
+    }
+
+out:
+    xc_hypercall_buffer_free(xch, call_list);
+    xc_hypercall_buffer_array_destroy(xch, platform_op_list);
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c5d0db..0fa0c12 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -47,6 +47,7 @@
 #include <xen/xsm/flask_op.h>
 #include <xen/tmem.h>
 #include <xen/kexec.h>
+#include <xen/platform.h>
 
 #include "xentoollog.h"
 
@@ -2643,6 +2644,17 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
  */
 int xc_kexec_unload(xc_interface *xch, int type);
 
+typedef xenpf_resource_op_t xc_resource_op_t;
+
+struct xc_resource_data {
+    uint64_t result;
+    uint32_t flags;
+    xc_resource_op_t rsc_op;
+};
+
+typedef struct xc_resource_data xc_resource_data_t;
+int xc_resource_op(xc_interface *xch, uint32_t nr, xc_resource_data_t *data);
+
 #endif /* XENCTRL_H */
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (3 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 04/11] tools: provide interface for generic resource access Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05 11:05   ` Andrew Cooper
  2014-09-05  8:37 ` [PATCH v15 06/11] x86: dynamically attach/detach QoS monitoring service for a guest Chao Peng
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Detect platform QoS feature status and enumerate the resource types,
one of which is to monitor the L3 cache occupancy.

Also introduce a Xen command line parameter to control the QoS
feature status.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 docs/misc/xen-command-line.markdown |   14 +++++
 xen/arch/x86/Makefile               |    1 +
 xen/arch/x86/pqos.c                 |  108 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c                |    3 +
 xen/include/asm-x86/cpufeature.h    |    1 +
 xen/include/asm-x86/pqos.h          |   53 +++++++++++++++++
 6 files changed, 180 insertions(+)
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index af93e17..fcda322 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1005,6 +1005,20 @@ This option can be specified more than once (up to 8 times at present).
 ### ple\_window
 > `= <integer>`
 
+### pqos (Intel)
+> `= List of ( pqos_monitor:<boolean> | rmid_max:<integer> )`
+
+> Default: `pqos=pqos_monitor:0,rmid_max:255`
+
+Configure platform QoS services, which are available on Intel Haswell Server
+family and future platforms.
+
+`pqos` instructs Xen to enable/disable platform QoS service.
+
+`pqos_monitor` instructs Xen to enable/disable QoS monitoring service.
+
+`rmid_max` indicates the max value for rmid.
+
 ### reboot
 > `= t[riple] | k[bd] | a[cpi] | p[ci] | n[o] [, [w]arm | [c]old]`
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index c1e244d..3d92fd7 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -59,6 +59,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += pqos.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
new file mode 100644
index 0000000..aca9ea2
--- /dev/null
+++ b/xen/arch/x86/pqos.c
@@ -0,0 +1,108 @@
+/*
+ * pqos.c: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <xen/init.h>
+#include <xen/cpu.h>
+#include <asm/pqos.h>
+
+#define PQOS_MONITOR        (1<<0)
+
+struct pqos_monitor *__read_mostly pqosm = NULL;
+static bool_t __initdata opt_pqos = 0;
+static unsigned int __initdata opt_rmid_max = 255;
+
+static void __init parse_pqos_param(char *s)
+{
+    char *ss, *val_str;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        val_str = strchr(s, ':');
+        if ( val_str )
+            *val_str++ = '\0';
+
+        if ( !strcmp(s, "pqos_monitor")
+             && ( !val_str || parse_bool(val_str) == 1 )) {
+            opt_pqos &= PQOS_MONITOR;
+        } else if ( val_str && !strcmp(s, "rmid_max") )
+            opt_rmid_max = simple_strtoul(val_str, NULL, 0);
+
+        s = ss + 1;
+    } while ( ss );
+}
+custom_param("pqos", parse_pqos_param);
+
+static void __init init_pqos_monitor(unsigned int rmid_max)
+{
+    unsigned int eax, ebx, ecx, edx;
+    unsigned int rmid;
+
+    if ( !boot_cpu_has(X86_FEATURE_QOSM) )
+        return;
+
+    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &edx);
+    if ( !edx )
+        return;
+
+    pqosm = xzalloc(struct pqos_monitor);
+    if ( !pqosm )
+        return;
+
+    pqosm->qm_features = edx;
+    pqosm->rmid_mask = ~(~0ull << get_count_order(ebx));
+    pqosm->rmid_max = min(rmid_max, ebx);
+
+    if ( pqosm->qm_features & QOS_MONITOR_TYPE_L3 )
+    {
+        cpuid_count(0xf, 1, &eax, &ebx, &ecx, &edx);
+        pqosm->l3m.upscaling_factor = ebx;
+        pqosm->l3m.rmid_max = ecx;
+        pqosm->l3m.l3_features = edx;
+    }
+
+    pqosm->rmid_max = min(rmid_max, pqosm->l3m.rmid_max);
+    BUG_ON(pqosm->rmid_max < 0xffffffff);
+    pqosm->rmid_to_dom = xmalloc_array(domid_t, pqosm->rmid_max + 1);
+    if ( !pqosm->rmid_to_dom )
+    {
+        xfree(pqosm);
+        return;
+    }
+    /* Reserve RMID 0 for all domains not being monitored */
+    pqosm->rmid_to_dom[0] = DOMID_XEN;
+    for ( rmid = 1; rmid <= pqosm->rmid_max; rmid++ )
+        pqosm->rmid_to_dom[rmid] = DOMID_INVALID;
+
+    printk(XENLOG_INFO "Platform QoS Monitoring Enabled.\n");
+}
+
+void __init init_platform_qos(void)
+{
+    if ( opt_pqos && opt_rmid_max )
+        init_pqos_monitor(opt_rmid_max);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 6a814cd..b4ad3a6 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -49,6 +49,7 @@
 #include <xen/cpu.h>
 #include <asm/nmi.h>
 #include <asm/alternative.h>
+#include <asm/pqos.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1440,6 +1441,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     domain_unpause_by_systemcontroller(dom0);
 
+    init_platform_qos();
+
     reset_stack_and_jump(init_done);
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 8014241..eb1dd27 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -148,6 +148,7 @@
 #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
 #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
 #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
+#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
 #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
 #define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
 #define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
new file mode 100644
index 0000000..96bfdc6
--- /dev/null
+++ b/xen/include/asm-x86/pqos.h
@@ -0,0 +1,53 @@
+/*
+ * pqos.h: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef __ASM_PQOS_H__
+#define __ASM_PQOS_H__
+
+/* QoS Resource Type Enumeration */
+#define QOS_MONITOR_TYPE_L3            0x2
+
+/* L3 Monitoring Features */
+#define L3_FEATURE_OCCUPANCY           0x1
+
+struct pqos_monitor_l3 {
+    unsigned int l3_features;
+    unsigned int upscaling_factor;
+    unsigned int rmid_max;
+};
+
+struct pqos_monitor {
+    unsigned long rmid_mask;
+    unsigned int rmid_max;
+    unsigned int qm_features;
+    domid_t *rmid_to_dom;
+    struct pqos_monitor_l3 l3m;
+};
+
+extern struct pqos_monitor *pqosm;
+
+void init_platform_qos(void);
+
+#endif /* __ASM_PQOS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 06/11] x86: dynamically attach/detach QoS monitoring service for a guest
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (4 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 07/11] x86: collect global QoS monitoring information Chao Peng
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add hypervisor side support for dynamically attach and detach QoS
monitoring services for a certain guest.

When attach Qos monitoring service for a guest, system will allocate an
RMID for it. When detach or guest is shutdown, the RMID will be
recycled for future use.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c        |    3 +++
 xen/arch/x86/domctl.c        |   29 ++++++++++++++++++++++++++
 xen/arch/x86/pqos.c          |   46 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |    2 ++
 xen/include/asm-x86/pqos.h   |    9 +++++++++
 xen/include/public/domctl.h  |   12 +++++++++++
 6 files changed, 101 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7e0e78..99fce11 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -60,6 +60,7 @@
 #include <xen/numa.h>
 #include <xen/iommu.h>
 #include <compat/vcpu.h>
+#include <asm/pqos.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 DEFINE_PER_CPU(unsigned long, cr4);
@@ -644,6 +645,8 @@ void arch_domain_destroy(struct domain *d)
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
+
+    pqos_monitor_free_rmid(d);
 }
 
 unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 7a5de43..c90eaf5 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -35,6 +35,7 @@
 #include <asm/mem_sharing.h>
 #include <asm/xstate.h>
 #include <asm/debugger.h>
+#include <asm/pqos.h>
 
 static int gdbsx_guest_mem_io(
     domid_t domid, struct xen_domctl_gdbsx_memio *iop)
@@ -1319,6 +1320,34 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_pqos_monitor_op:
+        if ( !pqos_monitor_enabled() )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        switch ( domctl->u.pqos_monitor_op.cmd )
+        {
+        case XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH:
+            ret = pqos_monitor_alloc_rmid(d);
+            break;
+        case XEN_DOMCTL_PQOS_MONITOR_OP_DETACH:
+            if ( d->arch.pqos_rmid > 0 )
+                pqos_monitor_free_rmid(d);
+            else
+                ret = -ENOENT;
+            break;
+        case XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID:
+            domctl->u.pqos_monitor_op.data = d->arch.pqos_rmid;
+            copyback = 1;
+            break;
+        default:
+            ret = -ENOSYS;
+            break;
+        }
+        break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index aca9ea2..885bd9b 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -15,6 +15,7 @@
  */
 #include <xen/init.h>
 #include <xen/cpu.h>
+#include <xen/sched.h>
 #include <asm/pqos.h>
 
 #define PQOS_MONITOR        (1<<0)
@@ -97,6 +98,51 @@ void __init init_platform_qos(void)
         init_pqos_monitor(opt_rmid_max);
 }
 
+/* Called with domain lock held, no pqos specific lock needed */
+int pqos_monitor_alloc_rmid(struct domain *d)
+{
+    unsigned int rmid;
+
+    ASSERT(pqos_monitor_enabled());
+
+    if ( d->arch.pqos_rmid > 0 )
+        return -EEXIST;
+
+    for ( rmid = 1; rmid <= pqosm->rmid_max; rmid++ )
+    {
+        if ( pqosm->rmid_to_dom[rmid] != DOMID_INVALID)
+            continue;
+
+        pqosm->rmid_to_dom[rmid] = d->domain_id;
+        break;
+    }
+
+    /* No RMID available, assign RMID=0 by default */
+    if ( rmid > pqosm->rmid_max )
+    {
+        d->arch.pqos_rmid = 0;
+        return -EUSERS;
+    }
+
+    d->arch.pqos_rmid = rmid;
+
+    return 0;
+}
+
+/* Called with domain lock held, no pqos specific lock needed */
+void pqos_monitor_free_rmid(struct domain *d)
+{
+    unsigned int rmid;
+
+    rmid = d->arch.pqos_rmid;
+    /* We do not free system reserved "RMID=0" */
+    if ( rmid == 0 )
+        return;
+
+    pqosm->rmid_to_dom[rmid] = DOMID_INVALID;
+    d->arch.pqos_rmid = 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 83329ed..ca295fa 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -313,6 +313,8 @@ struct arch_domain
     /* Shared page for notifying that explicit PIRQ EOI is required. */
     unsigned long *pirq_eoi_map;
     unsigned long pirq_eoi_map_mfn;
+
+    unsigned int pqos_rmid; /* QoS monitoring RMID assigned to the domain */
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 96bfdc6..5737b51 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_PQOS_H__
 #define __ASM_PQOS_H__
 
+#include <xen/types.h>
+
 /* QoS Resource Type Enumeration */
 #define QOS_MONITOR_TYPE_L3            0x2
 
@@ -38,7 +40,14 @@ struct pqos_monitor {
 
 extern struct pqos_monitor *pqosm;
 
+static inline bool_t pqos_monitor_enabled(void)
+{
+    return !!pqosm;
+}
+
 void init_platform_qos(void);
+int pqos_monitor_alloc_rmid(struct domain *d);
+void pqos_monitor_free_rmid(struct domain *d);
 
 #endif /* __ASM_PQOS_H__ */
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5b11bbf..af9e775 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -936,6 +936,16 @@ typedef struct xen_domctl_vcpu_msrs xen_domctl_vcpu_msrs_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vcpu_msrs_t);
 #endif
 
+struct xen_domctl_pqos_monitor_op {
+#define XEN_DOMCTL_PQOS_MONITOR_OP_DETACH         0
+#define XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH         1
+#define XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID     2
+    uint32_t cmd;
+    uint32_t data;
+};
+typedef struct xen_domctl_pqos_op xen_domctl_pqos_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_pqos_op_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1008,6 +1018,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_cacheflush                    71
 #define XEN_DOMCTL_get_vcpu_msrs                 72
 #define XEN_DOMCTL_set_vcpu_msrs                 73
+#define XEN_DOMCTL_pqos_monitor_op               74
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1068,6 +1079,7 @@ struct xen_domctl {
         struct xen_domctl_cacheflush        cacheflush;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_pqos_monitor_op   pqos_monitor_op;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 07/11] x86: collect global QoS monitoring information
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (5 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 06/11] x86: dynamically attach/detach QoS monitoring service for a guest Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 08/11] x86: enable QoS monitoring for each domain RMID Chao Peng
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

This implementation tries to put all policies into user space, thus some
global QoS monitoring information needs to be exposed, such as the total
RMID count, L3 upscaling factor, etc.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/cpu/intel_cacheinfo.c |   49 ++----------------------------------
 xen/arch/x86/sysctl.c              |   43 +++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h   |   45 +++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h        |   14 +++++++++++
 4 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/cpu/intel_cacheinfo.c b/xen/arch/x86/cpu/intel_cacheinfo.c
index 430f939..48970c0 100644
--- a/xen/arch/x86/cpu/intel_cacheinfo.c
+++ b/xen/arch/x86/cpu/intel_cacheinfo.c
@@ -81,54 +81,9 @@ static struct _cache_table cache_table[] __cpuinitdata =
 	{ 0x00, 0, 0}
 };
 
-
-enum _cache_type
-{
-	CACHE_TYPE_NULL	= 0,
-	CACHE_TYPE_DATA = 1,
-	CACHE_TYPE_INST = 2,
-	CACHE_TYPE_UNIFIED = 3
-};
-
-union _cpuid4_leaf_eax {
-	struct {
-		enum _cache_type	type:5;
-		unsigned int		level:3;
-		unsigned int		is_self_initializing:1;
-		unsigned int		is_fully_associative:1;
-		unsigned int		reserved:4;
-		unsigned int		num_threads_sharing:12;
-		unsigned int		num_cores_on_die:6;
-	} split;
-	u32 full;
-};
-
-union _cpuid4_leaf_ebx {
-	struct {
-		unsigned int		coherency_line_size:12;
-		unsigned int		physical_line_partition:10;
-		unsigned int		ways_of_associativity:10;
-	} split;
-	u32 full;
-};
-
-union _cpuid4_leaf_ecx {
-	struct {
-		unsigned int		number_of_sets:32;
-	} split;
-	u32 full;
-};
-
-struct _cpuid4_info {
-	union _cpuid4_leaf_eax eax;
-	union _cpuid4_leaf_ebx ebx;
-	union _cpuid4_leaf_ecx ecx;
-	unsigned long size;
-};
-
 unsigned short			num_cache_leaves;
 
-static int __cpuinit cpuid4_cache_lookup(int index, struct _cpuid4_info *this_leaf)
+int cpuid4_cache_lookup(int index, struct cpuid4_info *this_leaf)
 {
 	union _cpuid4_leaf_eax 	eax;
 	union _cpuid4_leaf_ebx 	ebx;
@@ -185,7 +140,7 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
 		 * parameters cpuid leaf to find the cache details
 		 */
 		for (i = 0; i < num_cache_leaves; i++) {
-			struct _cpuid4_info this_leaf;
+			struct cpuid4_info this_leaf;
 
 			int retval;
 
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 15d4b91..dc6012d 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/nodemask.h>
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
+#include <asm/pqos.h>
 
 #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
 
@@ -101,6 +102,48 @@ long arch_do_sysctl(
     }
     break;
 
+    case XEN_SYSCTL_pqos_monitor_op:
+        if ( !pqos_monitor_enabled() )
+            return -ENODEV;
+
+        if ( sysctl->u.pqos_monitor_op.flags != 0 )
+            return -EINVAL;
+
+        switch ( sysctl->u.pqos_monitor_op.cmd )
+        {
+        case XEN_SYSCTL_PQOS_MONITOR_cqm_enabled:
+            sysctl->u.pqos_monitor_op.data =
+                (pqosm->qm_features & QOS_MONITOR_TYPE_L3) &&
+                (pqosm->l3m.l3_features & L3_FEATURE_OCCUPANCY);
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_total_rmid:
+            sysctl->u.pqos_monitor_op.data = pqosm->rmid_max;
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor:
+            sysctl->u.pqos_monitor_op.data = pqosm->l3m.upscaling_factor;
+            break;
+        case XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size:
+        {
+            struct cpuid4_info info;
+
+            ret = cpuid4_cache_lookup(3, &info);
+            if ( ret < 0 )
+                break;
+
+            sysctl->u.pqos_monitor_op.data = info.size / 1024; /* in KB unit */
+        }
+        break;
+        default:
+            sysctl->u.pqos_monitor_op.data = 0;
+            ret = -ENOSYS;
+            break;
+        }
+
+        if ( __copy_to_guest(u_sysctl, sysctl, 1) )
+            ret = -EFAULT;
+
+        break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index eb1dd27..fe1ab84 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -215,6 +215,51 @@
 #define cpu_has_vmx		boot_cpu_has(X86_FEATURE_VMXE)
 
 #define cpu_has_cpuid_faulting	boot_cpu_has(X86_FEATURE_CPUID_FAULTING)
+
+enum _cache_type {
+    CACHE_TYPE_NULL = 0,
+    CACHE_TYPE_DATA = 1,
+    CACHE_TYPE_INST = 2,
+    CACHE_TYPE_UNIFIED = 3
+};
+
+union _cpuid4_leaf_eax {
+    struct {
+        enum _cache_type type:5;
+        unsigned int level:3;
+        unsigned int is_self_initializing:1;
+        unsigned int is_fully_associative:1;
+        unsigned int reserved:4;
+        unsigned int num_threads_sharing:12;
+        unsigned int num_cores_on_die:6;
+    } split;
+    u32 full;
+};
+
+union _cpuid4_leaf_ebx {
+    struct {
+        unsigned int coherency_line_size:12;
+        unsigned int physical_line_partition:10;
+        unsigned int ways_of_associativity:10;
+    } split;
+    u32 full;
+};
+
+union _cpuid4_leaf_ecx {
+    struct {
+        unsigned int number_of_sets:32;
+    } split;
+    u32 full;
+};
+
+struct cpuid4_info {
+    union _cpuid4_leaf_eax eax;
+    union _cpuid4_leaf_ebx ebx;
+    union _cpuid4_leaf_ecx ecx;
+    unsigned long size;
+};
+
+int cpuid4_cache_lookup(int index, struct cpuid4_info *this_leaf);
 #endif
 
 #endif /* __ASM_I386_CPUFEATURE_H */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3588698..b6dffcb 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -636,6 +636,18 @@ struct xen_sysctl_coverage_op {
 typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
 
+#define XEN_SYSCTL_PQOS_MONITOR_get_total_rmid            0
+#define XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor   1
+/* The L3 cache size is returned in KB unit */
+#define XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size         2
+#define XEN_SYSCTL_PQOS_MONITOR_cqm_enabled               3
+struct xen_sysctl_pqos_monitor_op {
+    uint32_t cmd;
+    uint32_t flags;      /* padding variable, may be extended for future use */
+    uint64_t data;
+};
+typedef struct xen_sysctl_pqos_monitor_op xen_sysctl_pqos_monitor_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pqos_monitor_op_t);
 
 struct xen_sysctl {
     uint32_t cmd;
@@ -658,6 +670,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
+#define XEN_SYSCTL_pqos_monitor_op               21
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -679,6 +692,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpupool_op        cpupool_op;
         struct xen_sysctl_scheduler_op      scheduler_op;
         struct xen_sysctl_coverage_op       coverage_op;
+        struct xen_sysctl_pqos_monitor_op   pqos_monitor_op;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 08/11] x86: enable QoS monitoring for each domain RMID
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (6 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 07/11] x86: collect global QoS monitoring information Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 09/11] x86: add QoS monitoring related MSRs in allowed list Chao Peng
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

If the QoS monitoring service is attached to a domain, its related RMID
will be set to hardware for monitoring when the domain's vcpu is
scheduled in. When the domain's vcpu is scheduled out, RMID 0
(system reserved) will be set for monitoring.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c           |    5 +++++
 xen/arch/x86/pqos.c             |   27 +++++++++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |    3 +++
 xen/include/asm-x86/pqos.h      |    1 +
 4 files changed, 36 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 99fce11..bb448ef 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1415,6 +1415,8 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
+        if ( pqos_monitor_enabled() )
+            pqos_monitor_assoc_rmid(0);
         p->arch.ctxt_switch_from(p);
     }
 
@@ -1439,6 +1441,9 @@ static void __context_switch(void)
         }
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
+
+        if ( pqos_monitor_enabled() && n->domain->arch.pqos_rmid > 0 )
+            pqos_monitor_assoc_rmid(n->domain->arch.pqos_rmid);
     }
 
     gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index 885bd9b..8a8e0b2 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -20,9 +20,15 @@
 
 #define PQOS_MONITOR        (1<<0)
 
+struct pqr_assoc {
+    uint64_t val;
+    bool_t initialized;
+};
+
 struct pqos_monitor *__read_mostly pqosm = NULL;
 static bool_t __initdata opt_pqos = 0;
 static unsigned int __initdata opt_rmid_max = 255;
+static DEFINE_PER_CPU(struct pqr_assoc, pqr_assoc);
 
 static void __init parse_pqos_param(char *s)
 {
@@ -143,6 +149,27 @@ void pqos_monitor_free_rmid(struct domain *d)
     d->arch.pqos_rmid = 0;
 }
 
+void pqos_monitor_assoc_rmid(unsigned int rmid)
+{
+    uint64_t val;
+    uint64_t new_val;
+    struct pqr_assoc *pqr = &this_cpu(pqr_assoc);
+
+    if ( !pqr->initialized )
+    {
+        rdmsrl(MSR_IA32_PQR_ASSOC, pqr->val);
+        pqr->initialized = 1;
+    }
+    val = pqr->val;
+
+    new_val = (val & ~pqosm->rmid_mask) | (rmid & pqosm->rmid_mask);
+    if ( val != new_val )
+    {
+        wrmsrl(MSR_IA32_PQR_ASSOC, new_val);
+        pqr->val = new_val;
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index da732b7..0cf9548 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -328,6 +328,9 @@
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
 
+/* Platform QoS MSRs */
+#define MSR_IA32_PQR_ASSOC		0x00000c8f
+
 /* Intel Model 6 */
 #define MSR_P6_EVNTSEL0			0x00000186
 #define MSR_P6_EVNTSEL1			0x00000187
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 5737b51..31b26b4 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -48,6 +48,7 @@ static inline bool_t pqos_monitor_enabled(void)
 void init_platform_qos(void);
 int pqos_monitor_alloc_rmid(struct domain *d);
 void pqos_monitor_free_rmid(struct domain *d);
+void pqos_monitor_assoc_rmid(unsigned int rmid);
 
 #endif /* __ASM_PQOS_H__ */
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 09/11] x86: add QoS monitoring related MSRs in allowed list
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (7 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 08/11] x86: enable QoS monitoring for each domain RMID Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 10/11] xsm: add platform QoS related xsm policies Chao Peng
  2014-09-05  8:37 ` [PATCH v15 11/11] tools: CMDs and APIs for Platform QoS Monitoring Chao Peng
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Tool stack will try to access the two MSRs to perform QoS monitoring
related operations, thus added them in the allowed list.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 xen/arch/x86/platform_hypercall.c |    7 +++++++
 xen/include/asm-x86/msr-index.h   |    2 ++
 2 files changed, 9 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index e5ad4c9..a198456 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -68,6 +68,13 @@ struct xen_resource_access {
 
 static bool_t allow_access_msr(unsigned int msr)
 {
+    switch ( msr )
+    {
+    case MSR_IA32_QOSEVTSEL:
+    case MSR_IA32_QMC:
+        return 1;
+    }
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 0cf9548..c5b57ba 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -329,6 +329,8 @@
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
 
 /* Platform QoS MSRs */
+#define MSR_IA32_QOSEVTSEL		0x00000c8d
+#define MSR_IA32_QMC			0x00000c8e
 #define MSR_IA32_PQR_ASSOC		0x00000c8f
 
 /* Intel Model 6 */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 10/11] xsm: add platform QoS related xsm policies
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (8 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 09/11] x86: add QoS monitoring related MSRs in allowed list Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  2014-09-05  8:37 ` [PATCH v15 11/11] tools: CMDs and APIs for Platform QoS Monitoring Chao Peng
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Add xsm policies for QoS monitoring related hypercalls.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    3 ++-
 xen/xsm/flask/hooks.c                        |    7 +++++++
 xen/xsm/flask/policy/access_vectors          |    4 ++++
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index dedc035..5e7042c 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -49,7 +49,7 @@ define(`create_domain_common', `
 			getdomaininfo hypercall setvcpucontext setextvcpucontext
 			getscheduler getvcpuinfo getvcpuextstate getaddrsize
 			getaffinity setaffinity };
-	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim  set_max_evtchn };
+	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn pqos_monitor_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op };
diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 562b8df..fc3fdb2 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -66,6 +66,7 @@ allow dom0_t xen_t:xen {
 };
 allow dom0_t xen_t:xen2 {
     resource_op
+    pqos_monitor_op
 };
 allow dom0_t xen_t:mmu memorymap;
 
@@ -79,7 +80,7 @@ allow dom0_t dom0_t:domain {
 	getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-	set_cpuid gettsc settsc setscheduler set_max_evtchn
+	set_cpuid gettsc settsc setscheduler set_max_evtchn pqos_monitor_op
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index fcfed25..7d3fb52 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -715,6 +715,9 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_cacheflush:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CACHEFLUSH);
 
+    case XEN_DOMCTL_pqos_monitor_op:
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PQOS_MONITOR_OP);
+
     default:
         printk("flask_domctl: Unknown op %d\n", cmd);
         return -EPERM;
@@ -770,6 +773,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_numainfo:
         return domain_has_xen(current->domain, XEN__PHYSINFO);
 
+    case XEN_SYSCTL_pqos_monitor_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__PQOS_MONITOR_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index b606441..fd71ac7 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -81,6 +81,8 @@ class xen2
 {
 # XENPF_resource_op
     resource_op
+# XEN_SYSCTL_pqos_monitor_op
+    pqos_monitor_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
@@ -208,6 +210,8 @@ class domain2
     cacheflush
 # Creation of the hardware domain when it is not dom0
     create_hardware_domain
+# XEN_DOMCTL_pqos_monitor_op
+    pqos_monitor_op
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v15 11/11] tools: CMDs and APIs for Platform QoS Monitoring
  2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
                   ` (9 preceding siblings ...)
  2014-09-05  8:37 ` [PATCH v15 10/11] xsm: add platform QoS related xsm policies Chao Peng
@ 2014-09-05  8:37 ` Chao Peng
  10 siblings, 0 replies; 36+ messages in thread
From: Chao Peng @ 2014-09-05  8:37 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Introduced some new xl commands to enable/disable platform QoS
monitoring feature.

The following two commands is to attach/detach the QoS monitoring
feature to/from a certain domain.

$ xl pqos-monitor-attach domid
$ xl pqos-monitor-detach domid

This command is to display the QoS monitoring information, such as CQM,
to show the L3 cache occupancy.

$ xl pqos-monitor-show cache_occupancy <domid>

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 docs/man/xl.pod.1           |   24 +++++
 tools/libxc/Makefile        |    1 +
 tools/libxc/xc_msr_x86.h    |   36 ++++++++
 tools/libxc/xc_pqos.c       |  210 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h       |   17 ++++
 tools/libxl/Makefile        |    2 +-
 tools/libxl/libxl.h         |   19 ++++
 tools/libxl/libxl_pqos.c    |  184 +++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |    4 +
 tools/libxl/libxl_utils.c   |   28 ++++++
 tools/libxl/xl.h            |    3 +
 tools/libxl/xl_cmdimpl.c    |  131 +++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c   |   17 ++++
 13 files changed, 675 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxc/xc_msr_x86.h
 create mode 100644 tools/libxc/xc_pqos.c
 create mode 100644 tools/libxl/libxl_pqos.c

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 9d1c2a5..ffb7191 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1382,6 +1382,30 @@ Load FLASK policy from the given policy file. The initial policy is provided to
 the hypervisor as a multiboot module; this command allows runtime updates to the
 policy. Loading new security policy will reset runtime changes to device labels.
 
+=head1 PLATFORM QOS MONITORING
+
+Some new hardware may offer monitoring capability in each logical processor to
+measure specific quality-of-service metric. In Xen implementation, the
+monitoring granularity is domain level. To monitor a specific domain, just
+attach the domain id with the monitoring service. When the domain doesn't need
+to be monitored any more, detach the domain id from the monitoring service.
+
+=over 4
+
+=item B<pqos-monitor-attach> [I<domain-id>]
+
+attach: Attach the platform QoS monitoring service to a domain.
+
+=item B<pqos-monitor-detach> [I<domain-id>]
+
+detach: Detach the platform QoS monitoring service from a domain.
+
+=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
+
+Show monitoring data for a certain domain or all domains. Current supported
+QoS monitor types are:
+ - "cache-occupancy": showing the L3 cache occupancy.
+
 =back
 
 =head1 TO BE DOCUMENTED
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index dde6109..38343ea 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -35,6 +35,7 @@ CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-y       += xc_resource.c
+CTRL_SRCS-y       += xc_pqos.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
 CTRL_SRCS-$(CONFIG_Linux) += xc_linux.c xc_linux_osdep.c
 CTRL_SRCS-$(CONFIG_FreeBSD) += xc_freebsd.c xc_freebsd_osdep.c
diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h
new file mode 100644
index 0000000..1e0ee99
--- /dev/null
+++ b/tools/libxc/xc_msr_x86.h
@@ -0,0 +1,36 @@
+/*
+ * xc_msr_x86.h
+ *
+ * MSR definition macros
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef XC_MSR_X86_H
+#define XC_MSR_X86_H
+
+#define MSR_IA32_QOSEVTSEL      0x00000c8d
+#define MSR_IA32_QMC            0x00000c8e
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_pqos.c b/tools/libxc/xc_pqos.c
new file mode 100644
index 0000000..14014e4
--- /dev/null
+++ b/tools/libxc/xc_pqos.c
@@ -0,0 +1,210 @@
+/*
+ * xc_pqos.c
+ *
+ * platform QoS related API functions.
+ *
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "xc_private.h"
+#include "xc_msr_x86.h"
+
+#define IA32_QM_CTR_ERROR_MASK         (0x3ull << 62)
+
+#define EVTID_L3_OCCUPANCY             0x1
+
+int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_ATTACH;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_DETACH;
+
+    return do_domctl(xch, &domctl);
+}
+
+int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
+                                    uint32_t *rmid)
+{
+    int rc;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_pqos_monitor_op;
+    domctl.domain = (domid_t)domid;
+    domctl.u.pqos_monitor_op.cmd = XEN_DOMCTL_PQOS_MONITOR_OP_QUERY_RMID;
+
+    rc = do_domctl(xch, &domctl);
+
+    if ( !rc )
+        *rmid = domctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *total_rmid = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd = XEN_SYSCTL_PQOS_MONITOR_get_total_rmid;
+    sysctl.u.pqos_monitor_op.flags = 0;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *total_rmid = sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
+                                            uint32_t *upscaling_factor)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *upscaling_factor = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd =
+        XEN_SYSCTL_PQOS_MONITOR_get_l3_upscaling_factor;
+    sysctl.u.pqos_monitor_op.flags = 0;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *upscaling_factor = sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
+                                      uint32_t *l3_cache_size)
+{
+    static int val = 0;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val )
+    {
+        *l3_cache_size = val;
+        return 0;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd =
+        XEN_SYSCTL_PQOS_MONITOR_get_l3_cache_size;
+    sysctl.u.pqos_monitor_op.flags = 0;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        val = *l3_cache_size= sysctl.u.pqos_monitor_op.data;
+
+    return rc;
+}
+
+int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
+    uint32_t cpu, xc_pqos_monitor_type type, uint64_t *monitor_data)
+{
+    xc_resource_data_t rsc_data[2];
+    uint32_t evtid;
+    int rc;
+
+    switch ( type )
+    {
+    case XC_PQOS_MONITOR_L3_OCCUPANCY:
+        evtid = EVTID_L3_OCCUPANCY;
+        break;
+    default:
+        return -1;
+    }
+
+    rsc_data[0].rsc_op.cmd = XEN_RESOURCE_OP_MSR_WRITE;
+    rsc_data[0].rsc_op.cpu = cpu;
+    rsc_data[0].rsc_op.idx = MSR_IA32_QOSEVTSEL;
+    rsc_data[0].rsc_op.val = (uint64_t)rmid << 32 | evtid;
+    rsc_data[0].flags = MC_NO_PREEMPT;
+
+    rsc_data[1].rsc_op.cmd = XEN_RESOURCE_OP_MSR_READ;
+    rsc_data[1].rsc_op.cpu = cpu;
+    rsc_data[1].rsc_op.idx = MSR_IA32_QMC;
+    rsc_data[1].rsc_op.val = 0;
+
+    rc = xc_resource_op(xch, 2, rsc_data);
+    if ( rc )
+        return rc;
+
+    if ( rsc_data[1].rsc_op.val & IA32_QM_CTR_ERROR_MASK )
+        return -1;
+
+    *monitor_data = rsc_data[1].rsc_op.val;
+
+    return 0;
+}
+
+int xc_pqos_monitor_cqm_enabled(xc_interface *xch)
+{
+    static int val = -1;
+    int rc;
+    DECLARE_SYSCTL;
+
+    if ( val >= 0 )
+        return val;
+
+    sysctl.cmd = XEN_SYSCTL_pqos_monitor_op;
+    sysctl.u.pqos_monitor_op.cmd = XEN_SYSCTL_PQOS_MONITOR_cqm_enabled;
+    sysctl.u.pqos_monitor_op.flags = 0;
+
+    rc = do_sysctl(xch, &sysctl);
+    if ( !rc )
+    {
+        val = sysctl.u.pqos_monitor_op.data;
+        return val;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 0fa0c12..089de94 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2655,6 +2655,23 @@ struct xc_resource_data {
 typedef struct xc_resource_data xc_resource_data_t;
 int xc_resource_op(xc_interface *xch, uint32_t nr, xc_resource_data_t *data);
 
+enum xc_pqos_monitor_type {
+    XC_PQOS_MONITOR_L3_OCCUPANCY,
+};
+typedef enum xc_pqos_monitor_type xc_pqos_monitor_type;
+int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
+int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
+int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
+    uint32_t *rmid);
+int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
+int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
+    uint32_t *upscaling_factor);
+int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
+    uint32_t *l3_cache_size);
+int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
+    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
+int xc_pqos_monitor_cqm_enabled(xc_interface *xch);
+
 #endif /* XENCTRL_H */
 
 /*
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index bd0db3b..cf31058 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -43,7 +43,7 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
-LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
+LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_pqos.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
 ifeq ($(CONFIG_NetBSD),y)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 460207b..83a79e0 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -625,6 +625,13 @@ typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_BYTES(mac) mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]
 void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
 
+/*
+ * LIBXL_HAVE_PQOS_MONITOR_CQM
+ *
+ * If this is defined, the cache QoS monitoring feature is supported.
+ */
+#define LIBXL_HAVE_PQOS_MONITOR_CQM 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1341,6 +1348,18 @@ bool libxl_ms_vm_genid_is_zero(const libxl_ms_vm_genid *id);
 void libxl_ms_vm_genid_copy(libxl_ctx *ctx, libxl_ms_vm_genid *dst,
                             libxl_ms_vm_genid *src);
 
+int libxl_get_socket_cpu(libxl_ctx *ctx, uint32_t socketid);
+
+int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
+int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);
+int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
+int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
+    uint32_t *l3_cache_size);
+int libxl_pqos_monitor_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
+    uint32_t socketid, uint32_t *l3_cache_occupancy);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_pqos.c b/tools/libxl/libxl_pqos.c
new file mode 100644
index 0000000..7cb7508
--- /dev/null
+++ b/tools/libxl/libxl_pqos.c
@@ -0,0 +1,184 @@
+/*
+ * Copyright (C) 2014      Intel Corporation
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+
+
+#define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+
+static void libxl_pqos_monitor_err_msg(libxl_ctx *ctx, int err)
+{
+    GC_INIT(ctx);
+
+    char *msg;
+
+    switch (err) {
+    case ENOSYS:
+        msg = "unsupported operation";
+        break;
+    case ENODEV:
+        msg = "Platform QoS Monitoring is not supported in this system";
+        break;
+    case EEXIST:
+        msg = "Platform QoS Monitoring is already attached to this domain";
+        break;
+    case ENOENT:
+        msg = "Platform QoS Monitoring is not attached to this domain";
+        break;
+    case EUSERS:
+        msg = "there is no free RMID available";
+        break;
+    case ESRCH:
+        msg = "is this Domain ID valid?";
+        break;
+    case EFAULT:
+        msg = "failed to exchange data with Xen";
+        break;
+    default:
+        msg = "unknown error";
+        break;
+    }
+
+    LOGE(ERROR, "%s", msg);
+
+    GC_FREE;
+}
+
+int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_attach(ctx->xch, domid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_detach(ctx->xch, domid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid)
+{
+    int rc;
+    uint32_t rmid;
+
+    rc = xc_pqos_monitor_get_domain_rmid(ctx->xch, domid, &rmid);
+    if (rc < 0)
+        return 0;
+
+    return !!rmid;
+}
+
+int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx)
+{
+    return xc_pqos_monitor_cqm_enabled(ctx->xch);
+}
+
+int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_get_total_rmid(ctx->xch, total_rmid);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
+                                         uint32_t *l3_cache_size)
+{
+    int rc;
+
+    rc = xc_pqos_monitor_get_l3_cache_size(ctx->xch, l3_cache_size);
+    if (rc < 0) {
+        libxl_pqos_monitor_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_monitor_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
+    uint32_t socketid, uint32_t *l3_cache_occupancy)
+{
+    GC_INIT(ctx);
+
+    unsigned int rmid;
+    uint32_t upscaling_factor;
+    uint64_t monitor_data;
+    int cpu, rc;
+    xc_pqos_monitor_type type;
+
+    rc = xc_pqos_monitor_get_domain_rmid(ctx->xch, domid, &rmid);
+    if (rc < 0 || rmid == 0) {
+        LOGE(ERROR, "fail to get the domain rmid, "
+            "or domain is not attached with platform QoS monitoring service");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    cpu = libxl_get_socket_cpu(ctx, socketid);
+    if (cpu < 0) {
+        LOGE(ERROR, "failed to get socket cpu");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    type = XC_PQOS_MONITOR_L3_OCCUPANCY;
+    rc = xc_pqos_monitor_get_data(ctx->xch, rmid, cpu, type, &monitor_data);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get monitoring data");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = xc_pqos_monitor_get_l3_upscaling_factor(ctx->xch, &upscaling_factor);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get L3 upscaling factor");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *l3_cache_occupancy = upscaling_factor * monitor_data / 1024;
+    rc = 0;
+out:
+    GC_FREE;
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 931c9e9..f8bdc09 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -632,3 +632,7 @@ libxl_event = Struct("event",[
                                  ])),
            ("domain_create_console_available", None),
            ]))])
+
+libxl_pqos_monitor_type = Enumeration("pqos_monitor_type", [
+    (1, "CACHE_OCCUPANCY"),
+    ])
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 58df4f3..8ec822d 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -1065,6 +1065,34 @@ int libxl__random_bytes(libxl__gc *gc, uint8_t *buf, size_t len)
     return ret;
 }
 
+int libxl_get_socket_cpu(libxl_ctx *ctx, uint32_t socketid)
+{
+    int i, j, cpu, nr_cpus;
+    libxl_cputopology *topology;
+    int *socket_cpus;
+
+    topology = libxl_get_cpu_topology(ctx, &nr_cpus);
+    if (!topology)
+        return ERROR_FAIL;
+
+    socket_cpus = malloc(sizeof(int) * nr_cpus);
+    if (!socket_cpus) {
+        free(topology);
+        return ERROR_FAIL;
+    }
+
+    for (i = 0, j = 0; i < nr_cpus; i++)
+        if (topology[i].socket == socketid)
+            socket_cpus[j++] = i;
+
+    cpu = socket_cpus[rand() % j];
+
+    free(socket_cpus);
+    free(topology);
+
+    return cpu;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..4e9635a 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -110,6 +110,9 @@ int main_loadpolicy(int argc, char **argv);
 int main_remus(int argc, char **argv);
 #endif
 int main_devd(int argc, char **argv);
+int main_pqos_monitor_attach(int argc, char **argv);
+int main_pqos_monitor_detach(int argc, char **argv);
+int main_pqos_monitor_show(int argc, char **argv);
 
 void help(const char *command);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index e6b9615..0338842 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7391,6 +7391,137 @@ out:
     return ret;
 }
 
+static int pqos_monitor_show_cache_occupancy(uint32_t domid)
+{
+    uint32_t i, socketid, nr_sockets, total_rmid;
+    uint32_t l3_cache_size, l3_cache_occupancy;
+    libxl_physinfo info;
+    char *domain_name;
+    int rc, print_header, nr_domains;
+    libxl_dominfo *dominfo;
+
+    if (!libxl_pqos_monitor_cqm_enabled(ctx)) {
+        printf("CQM is not supported in the system\n");
+        return -1;
+    }
+
+    rc = libxl_get_physinfo(ctx, &info);
+    if (rc < 0) {
+        printf("failed to get system socket number\n");
+        return -1;
+    }
+    nr_sockets = info.nr_cpus / info.threads_per_core / info.cores_per_socket;
+
+    rc = libxl_pqos_monitor_get_total_rmid(ctx, &total_rmid);
+    if (rc < 0) {
+        printf("failed to get system total rmid number\n");
+        return -1;
+    }
+
+    rc = libxl_pqos_monitor_get_l3_cache_size(ctx, &l3_cache_size);
+    if (rc < 0) {
+        printf("failed to get system l3 cache size\n");
+        return -1;
+    }
+
+    printf("Total RMID: %d\n", total_rmid);
+    printf("Per-Socket L3 Cache Size: %d KB\n", l3_cache_size);
+
+    print_header = 1;
+    if (!(dominfo = libxl_list_domain(ctx, &nr_domains))) {
+        fprintf(stderr, "libxl_list_domain failed.\n");
+        return -1;
+    }
+    for (i = 0; i < nr_domains; i++) {
+        if (domid != ~0 && dominfo[i].domid != domid)
+            continue;
+        if (!libxl_pqos_monitor_domain_attached(ctx, dominfo[i].domid))
+            continue;
+        if (print_header) {
+            printf("%-40s %5s", "Name", "ID");
+            for (socketid = 0; socketid < nr_sockets; socketid++)
+                printf("%14s %d", "Socket", socketid);
+            printf("\n");
+            print_header = 0;
+        }
+        domain_name = libxl_domid_to_name(ctx, dominfo[i].domid);
+        printf("%-40s %5d", domain_name, dominfo[i].domid);
+        free(domain_name);
+        for (socketid = 0; socketid < nr_sockets; socketid++) {
+            rc = libxl_pqos_monitor_get_cache_occupancy(ctx, dominfo[i].domid,
+                     socketid, &l3_cache_occupancy);
+            printf("%13u KB", l3_cache_occupancy);
+        }
+        printf("\n");
+    }
+    libxl_dominfo_list_free(dominfo, nr_domains);
+
+    return 0;
+}
+
+int main_pqos_monitor_attach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, ret = 0;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-attach", 1) {
+        /* No options */
+    }
+
+    domid = find_domain(argv[optind]);
+    ret = libxl_pqos_monitor_attach(ctx, domid);
+
+    return ret;
+}
+
+int main_pqos_monitor_detach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, ret = 0;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-detach", 1) {
+        /* No options */
+    }
+
+    domid = find_domain(argv[optind]);
+    ret = libxl_pqos_monitor_detach(ctx, domid);
+
+    return ret;
+}
+
+int main_pqos_monitor_show(int argc, char **argv)
+{
+    int opt, ret = 0;
+    uint32_t domid;
+    libxl_pqos_monitor_type type;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-monitor-show", 1) {
+        /* No options */
+    }
+
+    libxl_pqos_monitor_type_from_string(argv[optind], &type);
+
+    if (optind + 1 >= argc)
+        domid = ~0;
+    else if (optind + 1 == argc - 1)
+        domid = find_domain(argv[optind + 1]);
+    else {
+        help("pqos-monitor-show");
+        return 2;
+    }
+
+    switch (type) {
+    case LIBXL_PQOS_MONITOR_TYPE_CACHE_OCCUPANCY:
+        ret = pqos_monitor_show_cache_occupancy(domid);
+        break;
+    default:
+        help("pqos-monitor-show");
+        return 2;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7b7fa92..e57d85a 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -500,6 +500,23 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-F                      Run in the foreground",
     },
+    { "pqos-monitor-attach",
+      &main_pqos_monitor_attach, 0, 1,
+      "Attach platform QoS monitoring to a domain",
+      "<Domain>",
+    },
+    { "pqos-monitor-detach",
+      &main_pqos_monitor_detach, 0, 1,
+      "Detach platform QoS monitoring from a domain",
+      "<Domain>",
+    },
+    { "pqos-monitor-show",
+      &main_pqos_monitor_show, 0, 1,
+      "Show platform QoS monitoring information",
+      "<QoS-Monitor-Type> <Domain>",
+      "Available monitor types:\n"
+      "\"cache_occupancy\":         Show L3 cache occupancy\n",
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-05  8:37 ` [PATCH v15 01/11] multicall: add no preemption ability between two calls Chao Peng
@ 2014-09-05 10:46   ` Andrew Cooper
  2014-09-09  6:43     ` Chao Peng
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-05 10:46 UTC (permalink / raw)
  To: Chao Peng, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, JBeulich, dgdegra

On 05/09/14 09:37, Chao Peng wrote:
> Add a flag to indicate if the execution can be preempted between two
> calls. If not specified, stay preemptable.
>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  xen/common/multicall.c   |    5 ++++-
>  xen/include/public/xen.h |    4 ++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
> index fa9d910..83b96eb 100644
> --- a/xen/common/multicall.c
> +++ b/xen/common/multicall.c
> @@ -40,6 +40,7 @@ do_multicall(
>      struct mc_state *mcs = &current->mc_state;
>      uint32_t         i;
>      int              rc = 0;
> +    bool_t           preemptable = 0;
>  
>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>      {
> @@ -52,7 +53,7 @@ do_multicall(
>  
>      for ( i = 0; !rc && i < nr_calls; i++ )
>      {
> -        if ( i && hypercall_preempt_check() )
> +        if ( preemptable && hypercall_preempt_check() )
>              goto preempted;
>  
>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
> @@ -61,6 +62,8 @@ do_multicall(
>              break;
>          }
>  
> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
> +

Please consider what would happen if a malicious guest set NO_PREEMPT on
every multicall entry.

IMO, two back-to-back entries whith NO_PREEMPT set is grounds for an
immediate hypercall failure.

Furthermore, what happens if one of the multicall operations pre-empts
of its own accord?

~Andrew

>          trace_multicall_call(&mcs->call);
>  
>          do_multicall_call(&mcs->call);
> diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> index a6a2092..3f8b908 100644
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -550,10 +550,14 @@ DEFINE_XEN_GUEST_HANDLE(mmu_update_t);
>  struct multicall_entry {
>      xen_ulong_t op, result;
>      xen_ulong_t args[6];
> +    uint32_t flags;
>  };
>  typedef struct multicall_entry multicall_entry_t;
>  DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
>  
> +/* These flags passed in the 'flags' field of multicall_entry_t. */
> +#define MC_NO_PREEMPT    (1<<0)  /* Can't be preempted before next entry? */
> +
>  #if __XEN_INTERFACE_VERSION__ < 0x00040400
>  /*
>   * Event channel endpoints per domain (when using the 2-level ABI):

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-05  8:37 ` [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall Chao Peng
@ 2014-09-05 10:59   ` Andrew Cooper
  2014-09-05 11:49     ` Jan Beulich
  2014-09-10  2:55     ` Chao Peng
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2014-09-05 10:59 UTC (permalink / raw)
  To: Chao Peng, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, JBeulich, dgdegra

On 05/09/14 09:37, Chao Peng wrote:
> Add a generic resource access hypercall for tool stack or other
> components, e.g., accessing MSR, port I/O, etc.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  xen/arch/x86/platform_hypercall.c |   63 +++++++++++++++++++++++++++++++++++++
>  xen/include/public/platform.h     |   15 +++++++++
>  xen/include/xlat.lst              |    1 +
>  3 files changed, 79 insertions(+)
>
> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> index 2162811..e5ad4c9 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -61,6 +61,42 @@ long cpu_down_helper(void *data);
>  long core_parking_helper(void *data);
>  uint32_t get_cur_idle_nums(void);
>  
> +struct xen_resource_access {
> +    int32_t ret;
> +    struct xenpf_resource_op *data;
> +};
> +
> +static bool_t allow_access_msr(unsigned int msr)
> +{
> +    return 0;
> +}
> +
> +static void resource_access_one(void *info)
> +{
> +    struct xen_resource_access *ra = info;
> +    int ret = 0;
> +
> +    switch ( ra->data->cmd )
> +    {
> +    case XEN_RESOURCE_OP_MSR_READ:
> +    case XEN_RESOURCE_OP_MSR_WRITE:
> +        if ( ra->data->idx >> 32 )
> +            ret = -EINVAL;
> +        else if ( !allow_access_msr(ra->data->idx) )
> +            ret = -EACCES;
> +        else if ( ra->data->cmd == XEN_RESOURCE_OP_MSR_READ )
> +            ret = rdmsr_safe(ra->data->idx, ra->data->val);
> +        else
> +            ret = wrmsr_safe(ra->data->idx, ra->data->val);
> +        break;
> +    default:
> +        ret = -EINVAL;
> +        break;
> +    }
> +
> +    ra->ret = ret;
> +}
> +
>  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>  {
>      ret_t ret = 0;
> @@ -601,6 +637,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>      }
>      break;
>  
> +    case XENPF_resource_op:
> +    {
> +        struct xen_resource_access ra;
> +        struct xenpf_resource_op *rsc_op = &op->u.resource_op;
> +        unsigned int cpu = smp_processor_id();
> +
> +        ra.data = rsc_op;
> +
> +        if ( rsc_op->cpu == cpu )
> +            resource_access_one(&ra);
> +        else if ( cpu_online(rsc_op->cpu) )
> +            on_selected_cpus(cpumask_of(rsc_op->cpu),

You must validate rsc_op->cpu before using it.  cpumask_of(something
large) will happily wander off the end of an array.

> +                         resource_access_one, &ra, 1);
> +        else
> +        {
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        if ( ra.ret )
> +        {
> +            ret = ra.ret;
> +            break;
> +        }
> +    }
> +    break;
> +
>      default:
>          ret = -ENOSYS;
>          break;
> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 053b9fa..3ed50de 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,20 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_resource_op   61
> +
> +#define XEN_RESOURCE_OP_MSR_READ  0
> +#define XEN_RESOURCE_OP_MSR_WRITE 1
> +
> +struct xenpf_resource_op {
> +    uint16_t cmd;       /* XEN_RESOURCE_OP_* */

There is an alignment issue here between 32 and 64bit. Putting an an
explicit unit16_t _resd; field should fix it.

~Andrew

> +    uint32_t cpu;
> +    uint64_t idx;
> +    uint64_t val;
> +};
> +typedef struct xenpf_resource_op xenpf_resource_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
> +
>  /*
>   * ` enum neg_errnoval
>   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
> @@ -553,6 +567,7 @@ struct xen_platform_op {
>          struct xenpf_cpu_hotadd        cpu_add;
>          struct xenpf_mem_hotadd        mem_add;
>          struct xenpf_core_parking      core_parking;
> +        struct xenpf_resource_op       resource_op;
>          uint8_t                        pad[128];
>      } u;
>  };
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 9a35dd7..06ed7b9 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -88,6 +88,7 @@
>  ?	xenpf_enter_acpi_sleep		platform.h
>  ?	xenpf_pcpuinfo			platform.h
>  ?	xenpf_pcpu_version		platform.h
> +?	xenpf_resource_op		platform.h
>  !	sched_poll			sched.h
>  ?	sched_remote_shutdown		sched.h
>  ?	sched_shutdown			sched.h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature
  2014-09-05  8:37 ` [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature Chao Peng
@ 2014-09-05 11:05   ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2014-09-05 11:05 UTC (permalink / raw)
  To: Chao Peng, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, JBeulich, dgdegra

On 05/09/14 09:37, Chao Peng wrote:
> Detect platform QoS feature status and enumerate the resource types,
> one of which is to monitor the L3 cache occupancy.
>
> Also introduce a Xen command line parameter to control the QoS
> feature status.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  docs/misc/xen-command-line.markdown |   14 +++++
>  xen/arch/x86/Makefile               |    1 +
>  xen/arch/x86/pqos.c                 |  108 +++++++++++++++++++++++++++++++++++
>  xen/arch/x86/setup.c                |    3 +
>  xen/include/asm-x86/cpufeature.h    |    1 +
>  xen/include/asm-x86/pqos.h          |   53 +++++++++++++++++
>  6 files changed, 180 insertions(+)
>  create mode 100644 xen/arch/x86/pqos.c
>  create mode 100644 xen/include/asm-x86/pqos.h
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index af93e17..fcda322 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1005,6 +1005,20 @@ This option can be specified more than once (up to 8 times at present).
>  ### ple\_window
>  > `= <integer>`
>  
> +### pqos (Intel)
> +> `= List of ( pqos_monitor:<boolean> | rmid_max:<integer> )`
> +
> +> Default: `pqos=pqos_monitor:0,rmid_max:255`
> +
> +Configure platform QoS services, which are available on Intel Haswell Server
> +family and future platforms.
> +
> +`pqos` instructs Xen to enable/disable platform QoS service.
> +
> +`pqos_monitor` instructs Xen to enable/disable QoS monitoring service.
> +
> +`rmid_max` indicates the max value for rmid.
> +
>  ### reboot
>  > `= t[riple] | k[bd] | a[cpi] | p[ci] | n[o] [, [w]arm | [c]old]`
>  
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index c1e244d..3d92fd7 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -59,6 +59,7 @@ obj-y += crash.o
>  obj-y += tboot.o
>  obj-y += hpet.o
>  obj-y += xstate.o
> +obj-y += pqos.o
>  
>  obj-$(crash_debug) += gdbstub.o
>  
> diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
> new file mode 100644
> index 0000000..aca9ea2
> --- /dev/null
> +++ b/xen/arch/x86/pqos.c
> @@ -0,0 +1,108 @@
> +/*
> + * pqos.c: Platform QoS related service for guest.
> + *
> + * Copyright (c) 2014, Intel Corporation
> + * Author: Dongxiao Xu <dongxiao.xu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +#include <xen/init.h>
> +#include <xen/cpu.h>
> +#include <asm/pqos.h>
> +
> +#define PQOS_MONITOR        (1<<0)
> +
> +struct pqos_monitor *__read_mostly pqosm = NULL;
> +static bool_t __initdata opt_pqos = 0;
> +static unsigned int __initdata opt_rmid_max = 255;
> +
> +static void __init parse_pqos_param(char *s)
> +{
> +    char *ss, *val_str;
> +
> +    do {
> +        ss = strchr(s, ',');
> +        if ( ss )
> +            *ss = '\0';
> +
> +        val_str = strchr(s, ':');
> +        if ( val_str )
> +            *val_str++ = '\0';
> +
> +        if ( !strcmp(s, "pqos_monitor")
> +             && ( !val_str || parse_bool(val_str) == 1 )) {
> +            opt_pqos &= PQOS_MONITOR;
> +        } else if ( val_str && !strcmp(s, "rmid_max") )
> +            opt_rmid_max = simple_strtoul(val_str, NULL, 0);
> +
> +        s = ss + 1;
> +    } while ( ss );
> +}
> +custom_param("pqos", parse_pqos_param);
> +
> +static void __init init_pqos_monitor(unsigned int rmid_max)
> +{
> +    unsigned int eax, ebx, ecx, edx;
> +    unsigned int rmid;
> +
> +    if ( !boot_cpu_has(X86_FEATURE_QOSM) )
> +        return;
> +
> +    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &edx);
> +    if ( !edx )
> +        return;
> +
> +    pqosm = xzalloc(struct pqos_monitor);
> +    if ( !pqosm )
> +        return;
> +
> +    pqosm->qm_features = edx;
> +    pqosm->rmid_mask = ~(~0ull << get_count_order(ebx));
> +    pqosm->rmid_max = min(rmid_max, ebx);
> +
> +    if ( pqosm->qm_features & QOS_MONITOR_TYPE_L3 )
> +    {
> +        cpuid_count(0xf, 1, &eax, &ebx, &ecx, &edx);
> +        pqosm->l3m.upscaling_factor = ebx;
> +        pqosm->l3m.rmid_max = ecx;
> +        pqosm->l3m.l3_features = edx;
> +    }
> +
> +    pqosm->rmid_max = min(rmid_max, pqosm->l3m.rmid_max);
> +    BUG_ON(pqosm->rmid_max < 0xffffffff);

Given that rmid_max defaults to 255 and pqosm->rmid_max is unsigned, how
can this possibly work?

~Andrew

> +    pqosm->rmid_to_dom = xmalloc_array(domid_t, pqosm->rmid_max + 1);
> +    if ( !pqosm->rmid_to_dom )
> +    {
> +        xfree(pqosm);
> +        return;
> +    }
> +    /* Reserve RMID 0 for all domains not being monitored */
> +    pqosm->rmid_to_dom[0] = DOMID_XEN;
> +    for ( rmid = 1; rmid <= pqosm->rmid_max; rmid++ )
> +        pqosm->rmid_to_dom[rmid] = DOMID_INVALID;
> +
> +    printk(XENLOG_INFO "Platform QoS Monitoring Enabled.\n");
> +}
> +
> +void __init init_platform_qos(void)
> +{
> +    if ( opt_pqos && opt_rmid_max )
> +        init_pqos_monitor(opt_rmid_max);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 6a814cd..b4ad3a6 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -49,6 +49,7 @@
>  #include <xen/cpu.h>
>  #include <asm/nmi.h>
>  #include <asm/alternative.h>
> +#include <asm/pqos.h>
>  
>  /* opt_nosmp: If true, secondary processors are ignored. */
>  static bool_t __initdata opt_nosmp;
> @@ -1440,6 +1441,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      domain_unpause_by_systemcontroller(dom0);
>  
> +    init_platform_qos();
> +
>      reset_stack_and_jump(init_done);
>  }
>  
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index 8014241..eb1dd27 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -148,6 +148,7 @@
>  #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
>  #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
>  #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
> +#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
>  #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
>  #define X86_FEATURE_MPX		(7*32+14) /* Memory Protection Extensions */
>  #define X86_FEATURE_RDSEED	(7*32+18) /* RDSEED instruction */
> diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
> new file mode 100644
> index 0000000..96bfdc6
> --- /dev/null
> +++ b/xen/include/asm-x86/pqos.h
> @@ -0,0 +1,53 @@
> +/*
> + * pqos.h: Platform QoS related service for guest.
> + *
> + * Copyright (c) 2014, Intel Corporation
> + * Author: Dongxiao Xu <dongxiao.xu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +#ifndef __ASM_PQOS_H__
> +#define __ASM_PQOS_H__
> +
> +/* QoS Resource Type Enumeration */
> +#define QOS_MONITOR_TYPE_L3            0x2
> +
> +/* L3 Monitoring Features */
> +#define L3_FEATURE_OCCUPANCY           0x1
> +
> +struct pqos_monitor_l3 {
> +    unsigned int l3_features;
> +    unsigned int upscaling_factor;
> +    unsigned int rmid_max;
> +};
> +
> +struct pqos_monitor {
> +    unsigned long rmid_mask;
> +    unsigned int rmid_max;
> +    unsigned int qm_features;
> +    domid_t *rmid_to_dom;
> +    struct pqos_monitor_l3 l3m;
> +};
> +
> +extern struct pqos_monitor *pqosm;
> +
> +void init_platform_qos(void);
> +
> +#endif /* __ASM_PQOS_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-05 10:59   ` Andrew Cooper
@ 2014-09-05 11:49     ` Jan Beulich
  2014-09-10  2:55     ` Chao Peng
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2014-09-05 11:49 UTC (permalink / raw)
  To: Andrew Cooper, Chao Peng, xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, dgdegra

>>> On 05.09.14 at 12:59, <andrew.cooper3@citrix.com> wrote:
> On 05/09/14 09:37, Chao Peng wrote:
>> --- a/xen/include/public/platform.h
>> +++ b/xen/include/public/platform.h
>> @@ -527,6 +527,20 @@ struct xenpf_core_parking {
>>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>>  
>> +#define XENPF_resource_op   61
>> +
>> +#define XEN_RESOURCE_OP_MSR_READ  0
>> +#define XEN_RESOURCE_OP_MSR_WRITE 1
>> +
>> +struct xenpf_resource_op {
>> +    uint16_t cmd;       /* XEN_RESOURCE_OP_* */
> 
> There is an alignment issue here between 32 and 64bit. Putting an an
> explicit unit16_t _resd; field should fix it.

In fact there isn't (uint32_t is 4-byte aligned in both cases), but
want such padding be spelled out anyway. Even more, to allow
using such fields for future extensions, checking that the caller
passes zero is pretty desirable (read: I personally wouldn't
accept a patch that doesn't).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-05 10:46   ` Andrew Cooper
@ 2014-09-09  6:43     ` Chao Peng
  2014-09-09 10:39       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-09  6:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, JBeulich, dgdegra

On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
> On 05/09/14 09:37, Chao Peng wrote:
> > Add a flag to indicate if the execution can be preempted between two
> > calls. If not specified, stay preemptable.
> >
> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > ---
> >  xen/common/multicall.c   |    5 ++++-
> >  xen/include/public/xen.h |    4 ++++
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/common/multicall.c b/xen/common/multicall.c
> > index fa9d910..83b96eb 100644
> > --- a/xen/common/multicall.c
> > +++ b/xen/common/multicall.c
> > @@ -40,6 +40,7 @@ do_multicall(
> >      struct mc_state *mcs = &current->mc_state;
> >      uint32_t         i;
> >      int              rc = 0;
> > +    bool_t           preemptable = 0;
> >  
> >      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
> >      {
> > @@ -52,7 +53,7 @@ do_multicall(
> >  
> >      for ( i = 0; !rc && i < nr_calls; i++ )
> >      {
> > -        if ( i && hypercall_preempt_check() )
> > +        if ( preemptable && hypercall_preempt_check() )
> >              goto preempted;
> >  
> >          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
> > @@ -61,6 +62,8 @@ do_multicall(
> >              break;
> >          }
> >  
> > +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
> > +
> 
> Please consider what would happen if a malicious guest set NO_PREEMPT on
> every multicall entry.

OK, I see. My direct purpose here is to support batch operations for
XENPF_resource_op added in next patch. Recall what Jan suggested in v14
comments, we have 3 possible ways to support XENPF_resource_op batch:
1) Add a field in the xenpf_resource_op to indicate the iteration;
2) Fiddle multicall mechanism, just like this patch;
3) Add a brand new hypercall.

So perhaps I will give up option 2) before I can see any improvement
here. While option 3) is aggressive, so I'd go option 1) through I also
don't quite like it (Totally because the iteration is transparent for user).

Chao
> 
> IMO, two back-to-back entries whith NO_PREEMPT set is grounds for an
> immediate hypercall failure.
> 
> Furthermore, what happens if one of the multicall operations pre-empts
> of its own accord?
> 
> ~Andrew
> 
> >          trace_multicall_call(&mcs->call);
> >  
> >          do_multicall_call(&mcs->call);
> > diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> > index a6a2092..3f8b908 100644
> > --- a/xen/include/public/xen.h
> > +++ b/xen/include/public/xen.h
> > @@ -550,10 +550,14 @@ DEFINE_XEN_GUEST_HANDLE(mmu_update_t);
> >  struct multicall_entry {
> >      xen_ulong_t op, result;
> >      xen_ulong_t args[6];
> > +    uint32_t flags;
> >  };
> >  typedef struct multicall_entry multicall_entry_t;
> >  DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
> >  
> > +/* These flags passed in the 'flags' field of multicall_entry_t. */
> > +#define MC_NO_PREEMPT    (1<<0)  /* Can't be preempted before next entry? */
> > +
> >  #if __XEN_INTERFACE_VERSION__ < 0x00040400
> >  /*
> >   * Event channel endpoints per domain (when using the 2-level ABI):
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09  6:43     ` Chao Peng
@ 2014-09-09 10:39       ` Jan Beulich
  2014-09-09 10:51         ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-09 10:39 UTC (permalink / raw)
  To: Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Andrew Cooper, Ian.Jackson, xen-devel, dgdegra

>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>> On 05/09/14 09:37, Chao Peng wrote:
>> > Add a flag to indicate if the execution can be preempted between two
>> > calls. If not specified, stay preemptable.
>> >
>> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> > ---
>> >  xen/common/multicall.c   |    5 ++++-
>> >  xen/include/public/xen.h |    4 ++++
>> >  2 files changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>> > index fa9d910..83b96eb 100644
>> > --- a/xen/common/multicall.c
>> > +++ b/xen/common/multicall.c
>> > @@ -40,6 +40,7 @@ do_multicall(
>> >      struct mc_state *mcs = &current->mc_state;
>> >      uint32_t         i;
>> >      int              rc = 0;
>> > +    bool_t           preemptable = 0;
>> >  
>> >      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>> >      {
>> > @@ -52,7 +53,7 @@ do_multicall(
>> >  
>> >      for ( i = 0; !rc && i < nr_calls; i++ )
>> >      {
>> > -        if ( i && hypercall_preempt_check() )
>> > +        if ( preemptable && hypercall_preempt_check() )
>> >              goto preempted;
>> >  
>> >          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>> > @@ -61,6 +62,8 @@ do_multicall(
>> >              break;
>> >          }
>> >  
>> > +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>> > +
>> 
>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>> every multicall entry.
> 
> OK, I see. My direct purpose here is to support batch operations for
> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
> comments, we have 3 possible ways to support XENPF_resource_op batch:
> 1) Add a field in the xenpf_resource_op to indicate the iteration;
> 2) Fiddle multicall mechanism, just like this patch;
> 3) Add a brand new hypercall.
> 
> So perhaps I will give up option 2) before I can see any improvement
> here. While option 3) is aggressive, so I'd go option 1) through I also
> don't quite like it (Totally because the iteration is transparent for user).

The I suppose you didn't really understand Andrew's comment: I
don't think he was suggesting to drop the approach, but instead
to implement it properly (read: securely).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09 10:39       ` Jan Beulich
@ 2014-09-09 10:51         ` Andrew Cooper
  2014-09-09 11:51           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-09 10:51 UTC (permalink / raw)
  To: Jan Beulich, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

On 09/09/14 11:39, Jan Beulich wrote:
>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>>> On 05/09/14 09:37, Chao Peng wrote:
>>>> Add a flag to indicate if the execution can be preempted between two
>>>> calls. If not specified, stay preemptable.
>>>>
>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>>> ---
>>>>  xen/common/multicall.c   |    5 ++++-
>>>>  xen/include/public/xen.h |    4 ++++
>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>>>> index fa9d910..83b96eb 100644
>>>> --- a/xen/common/multicall.c
>>>> +++ b/xen/common/multicall.c
>>>> @@ -40,6 +40,7 @@ do_multicall(
>>>>      struct mc_state *mcs = &current->mc_state;
>>>>      uint32_t         i;
>>>>      int              rc = 0;
>>>> +    bool_t           preemptable = 0;
>>>>  
>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>>>>      {
>>>> @@ -52,7 +53,7 @@ do_multicall(
>>>>  
>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
>>>>      {
>>>> -        if ( i && hypercall_preempt_check() )
>>>> +        if ( preemptable && hypercall_preempt_check() )
>>>>              goto preempted;
>>>>  
>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>>>> @@ -61,6 +62,8 @@ do_multicall(
>>>>              break;
>>>>          }
>>>>  
>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>>>> +
>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>>> every multicall entry.
>> OK, I see. My direct purpose here is to support batch operations for
>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
>> comments, we have 3 possible ways to support XENPF_resource_op batch:
>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
>> 2) Fiddle multicall mechanism, just like this patch;
>> 3) Add a brand new hypercall.
>>
>> So perhaps I will give up option 2) before I can see any improvement
>> here. While option 3) is aggressive, so I'd go option 1) through I also
>> don't quite like it (Totally because the iteration is transparent for user).
> The I suppose you didn't really understand Andrew's comment: I
> don't think he was suggesting to drop the approach, but instead
> to implement it properly (read: securely).
>
> Jan
>

That is certainly one part of it.

However, there is the other open question (dropped from this context) of
how to deal with a multicall which has NO_PREEMT set, which itself
preempts, and I don't have a good answer for this.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09 10:51         ` Andrew Cooper
@ 2014-09-09 11:51           ` Jan Beulich
  2014-09-09 12:44             ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-09 11:51 UTC (permalink / raw)
  To: Andrew Cooper, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

>>> On 09.09.14 at 12:51, <andrew.cooper3@citrix.com> wrote:
> On 09/09/14 11:39, Jan Beulich wrote:
>>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
>>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>>>> On 05/09/14 09:37, Chao Peng wrote:
>>>>> Add a flag to indicate if the execution can be preempted between two
>>>>> calls. If not specified, stay preemptable.
>>>>>
>>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>>>> ---
>>>>>  xen/common/multicall.c   |    5 ++++-
>>>>>  xen/include/public/xen.h |    4 ++++
>>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>>>>> index fa9d910..83b96eb 100644
>>>>> --- a/xen/common/multicall.c
>>>>> +++ b/xen/common/multicall.c
>>>>> @@ -40,6 +40,7 @@ do_multicall(
>>>>>      struct mc_state *mcs = &current->mc_state;
>>>>>      uint32_t         i;
>>>>>      int              rc = 0;
>>>>> +    bool_t           preemptable = 0;
>>>>>  
>>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>>>>>      {
>>>>> @@ -52,7 +53,7 @@ do_multicall(
>>>>>  
>>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
>>>>>      {
>>>>> -        if ( i && hypercall_preempt_check() )
>>>>> +        if ( preemptable && hypercall_preempt_check() )
>>>>>              goto preempted;
>>>>>  
>>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>>>>> @@ -61,6 +62,8 @@ do_multicall(
>>>>>              break;
>>>>>          }
>>>>>  
>>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>>>>> +
>>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>>>> every multicall entry.
>>> OK, I see. My direct purpose here is to support batch operations for
>>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
>>> comments, we have 3 possible ways to support XENPF_resource_op batch:
>>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
>>> 2) Fiddle multicall mechanism, just like this patch;
>>> 3) Add a brand new hypercall.
>>>
>>> So perhaps I will give up option 2) before I can see any improvement
>>> here. While option 3) is aggressive, so I'd go option 1) through I also
>>> don't quite like it (Totally because the iteration is transparent for user).
>> The I suppose you didn't really understand Andrew's comment: I
>> don't think he was suggesting to drop the approach, but instead
>> to implement it properly (read: securely).
> 
> That is certainly one part of it.
> 
> However, there is the other open question (dropped from this context) of
> how to deal with a multicall which has NO_PREEMT set, which itself
> preempts, and I don't have a good answer for this.

The pretty natural answer to this is - the specific handler knows
best what to do.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09 11:51           ` Jan Beulich
@ 2014-09-09 12:44             ` Andrew Cooper
  2014-09-09 13:15               ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-09 12:44 UTC (permalink / raw)
  To: Jan Beulich, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

On 09/09/14 12:51, Jan Beulich wrote:
>>>> On 09.09.14 at 12:51, <andrew.cooper3@citrix.com> wrote:
>> On 09/09/14 11:39, Jan Beulich wrote:
>>>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
>>>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>>>>> On 05/09/14 09:37, Chao Peng wrote:
>>>>>> Add a flag to indicate if the execution can be preempted between two
>>>>>> calls. If not specified, stay preemptable.
>>>>>>
>>>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>>>>> ---
>>>>>>  xen/common/multicall.c   |    5 ++++-
>>>>>>  xen/include/public/xen.h |    4 ++++
>>>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>>>>>> index fa9d910..83b96eb 100644
>>>>>> --- a/xen/common/multicall.c
>>>>>> +++ b/xen/common/multicall.c
>>>>>> @@ -40,6 +40,7 @@ do_multicall(
>>>>>>      struct mc_state *mcs = &current->mc_state;
>>>>>>      uint32_t         i;
>>>>>>      int              rc = 0;
>>>>>> +    bool_t           preemptable = 0;
>>>>>>  
>>>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>>>>>>      {
>>>>>> @@ -52,7 +53,7 @@ do_multicall(
>>>>>>  
>>>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
>>>>>>      {
>>>>>> -        if ( i && hypercall_preempt_check() )
>>>>>> +        if ( preemptable && hypercall_preempt_check() )
>>>>>>              goto preempted;
>>>>>>  
>>>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>>>>>> @@ -61,6 +62,8 @@ do_multicall(
>>>>>>              break;
>>>>>>          }
>>>>>>  
>>>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>>>>>> +
>>>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>>>>> every multicall entry.
>>>> OK, I see. My direct purpose here is to support batch operations for
>>>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
>>>> comments, we have 3 possible ways to support XENPF_resource_op batch:
>>>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
>>>> 2) Fiddle multicall mechanism, just like this patch;
>>>> 3) Add a brand new hypercall.
>>>>
>>>> So perhaps I will give up option 2) before I can see any improvement
>>>> here. While option 3) is aggressive, so I'd go option 1) through I also
>>>> don't quite like it (Totally because the iteration is transparent for user).
>>> The I suppose you didn't really understand Andrew's comment: I
>>> don't think he was suggesting to drop the approach, but instead
>>> to implement it properly (read: securely).
>> That is certainly one part of it.
>>
>> However, there is the other open question (dropped from this context) of
>> how to deal with a multicall which has NO_PREEMT set, which itself
>> preempts, and I don't have a good answer for this.
> The pretty natural answer to this is - the specific handler knows
> best what to do.
>
> Jan
>

Given our past history at retrofitting preempting into existing
hypercalls, the multicaller has no idea whether the ops they have
selected will preempt or not, and no way to guarentee that the behaviour
will stay the same in future.

The multicall dispatches to the regular hypercall handlers, which
(cant?) and certainly shouldn't distinguish between a regular hypercall
and multicall.

As I have been looking through this code, I have noticed that the NDEBUG
parameter corruption will break half of our existing preemption logic,
which does use some of the parameters to hold preemption information.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09 12:44             ` Andrew Cooper
@ 2014-09-09 13:15               ` Jan Beulich
  2014-09-10  1:32                 ` Chao Peng
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-09 13:15 UTC (permalink / raw)
  To: Andrew Cooper, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

>>> On 09.09.14 at 14:44, <andrew.cooper3@citrix.com> wrote:
> On 09/09/14 12:51, Jan Beulich wrote:
>>>>> On 09.09.14 at 12:51, <andrew.cooper3@citrix.com> wrote:
>>> On 09/09/14 11:39, Jan Beulich wrote:
>>>>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
>>>>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>>>>>> On 05/09/14 09:37, Chao Peng wrote:
>>>>>>> Add a flag to indicate if the execution can be preempted between two
>>>>>>> calls. If not specified, stay preemptable.
>>>>>>>
>>>>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>>>>>> ---
>>>>>>>  xen/common/multicall.c   |    5 ++++-
>>>>>>>  xen/include/public/xen.h |    4 ++++
>>>>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>>>>>>> index fa9d910..83b96eb 100644
>>>>>>> --- a/xen/common/multicall.c
>>>>>>> +++ b/xen/common/multicall.c
>>>>>>> @@ -40,6 +40,7 @@ do_multicall(
>>>>>>>      struct mc_state *mcs = &current->mc_state;
>>>>>>>      uint32_t         i;
>>>>>>>      int              rc = 0;
>>>>>>> +    bool_t           preemptable = 0;
>>>>>>>  
>>>>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>>>>>>>      {
>>>>>>> @@ -52,7 +53,7 @@ do_multicall(
>>>>>>>  
>>>>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
>>>>>>>      {
>>>>>>> -        if ( i && hypercall_preempt_check() )
>>>>>>> +        if ( preemptable && hypercall_preempt_check() )
>>>>>>>              goto preempted;
>>>>>>>  
>>>>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>>>>>>> @@ -61,6 +62,8 @@ do_multicall(
>>>>>>>              break;
>>>>>>>          }
>>>>>>>  
>>>>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>>>>>>> +
>>>>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>>>>>> every multicall entry.
>>>>> OK, I see. My direct purpose here is to support batch operations for
>>>>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
>>>>> comments, we have 3 possible ways to support XENPF_resource_op batch:
>>>>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
>>>>> 2) Fiddle multicall mechanism, just like this patch;
>>>>> 3) Add a brand new hypercall.
>>>>>
>>>>> So perhaps I will give up option 2) before I can see any improvement
>>>>> here. While option 3) is aggressive, so I'd go option 1) through I also
>>>>> don't quite like it (Totally because the iteration is transparent for user).
>>>> The I suppose you didn't really understand Andrew's comment: I
>>>> don't think he was suggesting to drop the approach, but instead
>>>> to implement it properly (read: securely).
>>> That is certainly one part of it.
>>>
>>> However, there is the other open question (dropped from this context) of
>>> how to deal with a multicall which has NO_PREEMT set, which itself
>>> preempts, and I don't have a good answer for this.
>> The pretty natural answer to this is - the specific handler knows
>> best what to do.
> 
> Given our past history at retrofitting preempting into existing
> hypercalls, the multicaller has no idea whether the ops they have
> selected will preempt or not, and no way to guarentee that the behaviour
> will stay the same in future.
> 
> The multicall dispatches to the regular hypercall handlers, which
> (cant?)

They can - current->mc_state.flags has _MCSF_in_multicall
set.

> and certainly shouldn't distinguish between a regular hypercall
> and multicall.

I agree with this. Yet it's a bug in the caller to request no
preemption at this layer for a constituent hypercall that can itself
preempt. But that's only a problem for the caller, not for the
hypervisor.

> As I have been looking through this code, I have noticed that the NDEBUG
> parameter corruption will break half of our existing preemption logic,
> which does use some of the parameters to hold preemption information.

Certainly not - call_list is being copied over a second time a few
lines after that NDEBUG section.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-09 13:15               ` Jan Beulich
@ 2014-09-10  1:32                 ` Chao Peng
  2014-09-10  9:43                   ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-10  1:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Andrew Cooper, Ian.Jackson, xen-devel, dgdegra

On Tue, Sep 09, 2014 at 02:15:59PM +0100, Jan Beulich wrote:
> >>> On 09.09.14 at 14:44, <andrew.cooper3@citrix.com> wrote:
> > On 09/09/14 12:51, Jan Beulich wrote:
> >>>>> On 09.09.14 at 12:51, <andrew.cooper3@citrix.com> wrote:
> >>> On 09/09/14 11:39, Jan Beulich wrote:
> >>>>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
> >>>>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
> >>>>>> On 05/09/14 09:37, Chao Peng wrote:
> >>>>>>> Add a flag to indicate if the execution can be preempted between two
> >>>>>>> calls. If not specified, stay preemptable.
> >>>>>>>
> >>>>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> >>>>>>> ---
> >>>>>>>  xen/common/multicall.c   |    5 ++++-
> >>>>>>>  xen/include/public/xen.h |    4 ++++
> >>>>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
> >>>>>>> index fa9d910..83b96eb 100644
> >>>>>>> --- a/xen/common/multicall.c
> >>>>>>> +++ b/xen/common/multicall.c
> >>>>>>> @@ -40,6 +40,7 @@ do_multicall(
> >>>>>>>      struct mc_state *mcs = &current->mc_state;
> >>>>>>>      uint32_t         i;
> >>>>>>>      int              rc = 0;
> >>>>>>> +    bool_t           preemptable = 0;
> >>>>>>>  
> >>>>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
> >>>>>>>      {
> >>>>>>> @@ -52,7 +53,7 @@ do_multicall(
> >>>>>>>  
> >>>>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
> >>>>>>>      {
> >>>>>>> -        if ( i && hypercall_preempt_check() )
> >>>>>>> +        if ( preemptable && hypercall_preempt_check() )
> >>>>>>>              goto preempted;
> >>>>>>>  
> >>>>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
> >>>>>>> @@ -61,6 +62,8 @@ do_multicall(
> >>>>>>>              break;
> >>>>>>>          }
> >>>>>>>  
> >>>>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
> >>>>>>> +
> >>>>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
> >>>>>> every multicall entry.
> >>>>> OK, I see. My direct purpose here is to support batch operations for
> >>>>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
> >>>>> comments, we have 3 possible ways to support XENPF_resource_op batch:
> >>>>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
> >>>>> 2) Fiddle multicall mechanism, just like this patch;
> >>>>> 3) Add a brand new hypercall.
> >>>>>
> >>>>> So perhaps I will give up option 2) before I can see any improvement
> >>>>> here. While option 3) is aggressive, so I'd go option 1) through I also
> >>>>> don't quite like it (Totally because the iteration is transparent for user).
> >>>> The I suppose you didn't really understand Andrew's comment: I
> >>>> don't think he was suggesting to drop the approach, but instead
> >>>> to implement it properly (read: securely).
> >>> That is certainly one part of it.
> >>>
> >>> However, there is the other open question (dropped from this context) of
> >>> how to deal with a multicall which has NO_PREEMT set, which itself
> >>> preempts, and I don't have a good answer for this.
> >> The pretty natural answer to this is - the specific handler knows
> >> best what to do.
> > 
> > Given our past history at retrofitting preempting into existing
> > hypercalls, the multicaller has no idea whether the ops they have
> > selected will preempt or not, and no way to guarentee that the behaviour
> > will stay the same in future.
> > 
> > The multicall dispatches to the regular hypercall handlers, which
> > (cant?)
> 
> They can - current->mc_state.flags has _MCSF_in_multicall
> set.
> 
> > and certainly shouldn't distinguish between a regular hypercall
> > and multicall.
> 
> I agree with this. Yet it's a bug in the caller to request no
> preemption at this layer for a constituent hypercall that can itself
> preempt. But that's only a problem for the caller, not for the
> hypervisor.
> 
> > As I have been looking through this code, I have noticed that the NDEBUG
> > parameter corruption will break half of our existing preemption logic,
> > which does use some of the parameters to hold preemption information.
> 
> Certainly not - call_list is being copied over a second time a few
> lines after that NDEBUG section.
> 
> Jan
> 
Clear, thank you two for your discussion.
So the only thing need to be done here is fixing the potential security
issue, right? I will follow this.

Thanks,
Chao
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-05 10:59   ` Andrew Cooper
  2014-09-05 11:49     ` Jan Beulich
@ 2014-09-10  2:55     ` Chao Peng
  2014-09-29 18:52       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-10  2:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, JBeulich, dgdegra

On Fri, Sep 05, 2014 at 11:59:11AM +0100, Andrew Cooper wrote:
> On 05/09/14 09:37, Chao Peng wrote:
> > Add a generic resource access hypercall for tool stack or other
> > components, e.g., accessing MSR, port I/O, etc.
> >
> > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > ---
> >  xen/arch/x86/platform_hypercall.c |   63 +++++++++++++++++++++++++++++++++++++
> >  xen/include/public/platform.h     |   15 +++++++++
> >  xen/include/xlat.lst              |    1 +
> >  3 files changed, 79 insertions(+)
> >
> > diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> > index 2162811..e5ad4c9 100644
> > --- a/xen/arch/x86/platform_hypercall.c
> > +++ b/xen/arch/x86/platform_hypercall.c
> > @@ -61,6 +61,42 @@ long cpu_down_helper(void *data);
> >  long core_parking_helper(void *data);
> >  uint32_t get_cur_idle_nums(void);
> >  
> > +struct xen_resource_access {
> > +    int32_t ret;
> > +    struct xenpf_resource_op *data;
> > +};
> > +
> > +static bool_t allow_access_msr(unsigned int msr)
> > +{
> > +    return 0;
> > +}
> > +
> > +static void resource_access_one(void *info)
> > +{
> > +    struct xen_resource_access *ra = info;
> > +    int ret = 0;
> > +
> > +    switch ( ra->data->cmd )
> > +    {
> > +    case XEN_RESOURCE_OP_MSR_READ:
> > +    case XEN_RESOURCE_OP_MSR_WRITE:
> > +        if ( ra->data->idx >> 32 )
> > +            ret = -EINVAL;
> > +        else if ( !allow_access_msr(ra->data->idx) )
> > +            ret = -EACCES;
> > +        else if ( ra->data->cmd == XEN_RESOURCE_OP_MSR_READ )
> > +            ret = rdmsr_safe(ra->data->idx, ra->data->val);
> > +        else
> > +            ret = wrmsr_safe(ra->data->idx, ra->data->val);
> > +        break;
> > +    default:
> > +        ret = -EINVAL;
> > +        break;
> > +    }
> > +
> > +    ra->ret = ret;
> > +}
> > +
> >  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
> >  {
> >      ret_t ret = 0;
> > @@ -601,6 +637,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
> >      }
> >      break;
> >  
> > +    case XENPF_resource_op:
> > +    {
> > +        struct xen_resource_access ra;
> > +        struct xenpf_resource_op *rsc_op = &op->u.resource_op;
> > +        unsigned int cpu = smp_processor_id();
> > +
> > +        ra.data = rsc_op;
> > +
> > +        if ( rsc_op->cpu == cpu )
> > +            resource_access_one(&ra);
> > +        else if ( cpu_online(rsc_op->cpu) )
> > +            on_selected_cpus(cpumask_of(rsc_op->cpu),
> 
> You must validate rsc_op->cpu before using it.  cpumask_of(something
> large) will happily wander off the end of an array.
cpu_online() should detect this.
> 
> > +                         resource_access_one, &ra, 1);
> > +        else
> > +        {
> > +            ret = -ENODEV;
> > +            break;
> > +        }
> > +
> > +        if ( ra.ret )
> > +        {
> > +            ret = ra.ret;
> > +            break;
> > +        }
> > +    }
> > +    break;
> > +
> >      default:
> >          ret = -ENOSYS;
> >          break;
> > diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> > index 053b9fa..3ed50de 100644
> > --- a/xen/include/public/platform.h
> > +++ b/xen/include/public/platform.h
> > @@ -527,6 +527,20 @@ struct xenpf_core_parking {
> >  typedef struct xenpf_core_parking xenpf_core_parking_t;
> >  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
> >  
> > +#define XENPF_resource_op   61
> > +
> > +#define XEN_RESOURCE_OP_MSR_READ  0
> > +#define XEN_RESOURCE_OP_MSR_WRITE 1
> > +
> > +struct xenpf_resource_op {
> > +    uint16_t cmd;       /* XEN_RESOURCE_OP_* */
> 
> There is an alignment issue here between 32 and 64bit. Putting an an
> explicit unit16_t _resd; field should fix it.
> 
> ~Andrew
OK, thanks.
> 
> > +    uint32_t cpu;
> > +    uint64_t idx;
> > +    uint64_t val;
> > +};
> > +typedef struct xenpf_resource_op xenpf_resource_op_t;
> > +DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
> > +
> >  /*
> >   * ` enum neg_errnoval
> >   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
> > @@ -553,6 +567,7 @@ struct xen_platform_op {
> >          struct xenpf_cpu_hotadd        cpu_add;
> >          struct xenpf_mem_hotadd        mem_add;
> >          struct xenpf_core_parking      core_parking;
> > +        struct xenpf_resource_op       resource_op;
> >          uint8_t                        pad[128];
> >      } u;
> >  };
> > diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> > index 9a35dd7..06ed7b9 100644
> > --- a/xen/include/xlat.lst
> > +++ b/xen/include/xlat.lst
> > @@ -88,6 +88,7 @@
> >  ?	xenpf_enter_acpi_sleep		platform.h
> >  ?	xenpf_pcpuinfo			platform.h
> >  ?	xenpf_pcpu_version		platform.h
> > +?	xenpf_resource_op		platform.h
> >  !	sched_poll			sched.h
> >  ?	sched_remote_shutdown		sched.h
> >  ?	sched_shutdown			sched.h
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10  1:32                 ` Chao Peng
@ 2014-09-10  9:43                   ` Andrew Cooper
  2014-09-10 10:07                     ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-10  9:43 UTC (permalink / raw)
  To: Chao Peng, Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

On 10/09/14 02:32, Chao Peng wrote:
> On Tue, Sep 09, 2014 at 02:15:59PM +0100, Jan Beulich wrote:
>>>>> On 09.09.14 at 14:44, <andrew.cooper3@citrix.com> wrote:
>>> On 09/09/14 12:51, Jan Beulich wrote:
>>>>>>> On 09.09.14 at 12:51, <andrew.cooper3@citrix.com> wrote:
>>>>> On 09/09/14 11:39, Jan Beulich wrote:
>>>>>>>>> On 09.09.14 at 08:43, <chao.p.peng@linux.intel.com> wrote:
>>>>>>> On Fri, Sep 05, 2014 at 11:46:20AM +0100, Andrew Cooper wrote:
>>>>>>>> On 05/09/14 09:37, Chao Peng wrote:
>>>>>>>>> Add a flag to indicate if the execution can be preempted between two
>>>>>>>>> calls. If not specified, stay preemptable.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>>>>>>>>> ---
>>>>>>>>>  xen/common/multicall.c   |    5 ++++-
>>>>>>>>>  xen/include/public/xen.h |    4 ++++
>>>>>>>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/xen/common/multicall.c b/xen/common/multicall.c
>>>>>>>>> index fa9d910..83b96eb 100644
>>>>>>>>> --- a/xen/common/multicall.c
>>>>>>>>> +++ b/xen/common/multicall.c
>>>>>>>>> @@ -40,6 +40,7 @@ do_multicall(
>>>>>>>>>      struct mc_state *mcs = &current->mc_state;
>>>>>>>>>      uint32_t         i;
>>>>>>>>>      int              rc = 0;
>>>>>>>>> +    bool_t           preemptable = 0;
>>>>>>>>>  
>>>>>>>>>      if ( unlikely(__test_and_set_bit(_MCSF_in_multicall, &mcs->flags)) )
>>>>>>>>>      {
>>>>>>>>> @@ -52,7 +53,7 @@ do_multicall(
>>>>>>>>>  
>>>>>>>>>      for ( i = 0; !rc && i < nr_calls; i++ )
>>>>>>>>>      {
>>>>>>>>> -        if ( i && hypercall_preempt_check() )
>>>>>>>>> +        if ( preemptable && hypercall_preempt_check() )
>>>>>>>>>              goto preempted;
>>>>>>>>>  
>>>>>>>>>          if ( unlikely(__copy_from_guest(&mcs->call, call_list, 1)) )
>>>>>>>>> @@ -61,6 +62,8 @@ do_multicall(
>>>>>>>>>              break;
>>>>>>>>>          }
>>>>>>>>>  
>>>>>>>>> +        preemptable = mcs->call.flags & MC_NO_PREEMPT;
>>>>>>>>> +
>>>>>>>> Please consider what would happen if a malicious guest set NO_PREEMPT on
>>>>>>>> every multicall entry.
>>>>>>> OK, I see. My direct purpose here is to support batch operations for
>>>>>>> XENPF_resource_op added in next patch. Recall what Jan suggested in v14
>>>>>>> comments, we have 3 possible ways to support XENPF_resource_op batch:
>>>>>>> 1) Add a field in the xenpf_resource_op to indicate the iteration;
>>>>>>> 2) Fiddle multicall mechanism, just like this patch;
>>>>>>> 3) Add a brand new hypercall.
>>>>>>>
>>>>>>> So perhaps I will give up option 2) before I can see any improvement
>>>>>>> here. While option 3) is aggressive, so I'd go option 1) through I also
>>>>>>> don't quite like it (Totally because the iteration is transparent for user).
>>>>>> The I suppose you didn't really understand Andrew's comment: I
>>>>>> don't think he was suggesting to drop the approach, but instead
>>>>>> to implement it properly (read: securely).
>>>>> That is certainly one part of it.
>>>>>
>>>>> However, there is the other open question (dropped from this context) of
>>>>> how to deal with a multicall which has NO_PREEMT set, which itself
>>>>> preempts, and I don't have a good answer for this.
>>>> The pretty natural answer to this is - the specific handler knows
>>>> best what to do.
>>> Given our past history at retrofitting preempting into existing
>>> hypercalls, the multicaller has no idea whether the ops they have
>>> selected will preempt or not, and no way to guarentee that the behaviour
>>> will stay the same in future.
>>>
>>> The multicall dispatches to the regular hypercall handlers, which
>>> (cant?)
>> They can - current->mc_state.flags has _MCSF_in_multicall
>> set.
>>
>>> and certainly shouldn't distinguish between a regular hypercall
>>> and multicall.
>> I agree with this. Yet it's a bug in the caller to request no
>> preemption at this layer for a constituent hypercall that can itself
>> preempt. But that's only a problem for the caller, not for the
>> hypervisor.
>>
>>> As I have been looking through this code, I have noticed that the NDEBUG
>>> parameter corruption will break half of our existing preemption logic,
>>> which does use some of the parameters to hold preemption information.
>> Certainly not - call_list is being copied over a second time a few
>> lines after that NDEBUG section.
>>
>> Jan
>>
> Clear, thank you two for your discussion.
> So the only thing need to be done here is fixing the potential security
> issue, right? I will follow this.
>
> Thanks,
> Chao

Actually, on further thought, using multicalls like this cannot possibly
be correct from a functional point of view.

Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
there is no guarantee that accesses to remote cpus msrs won't interleave
with a different natural access, clobbering the results of the wrmsr.

However this is solved, the wrmsr/rdmsr pair *must* be part of the same
synchronous thread of execution on the appropriate cpu.  You can trust
that interrupts won't play with these msrs, but you absolutely can't
guarantee that IPI/wrmsr/IPI/rdmsr will work.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10  9:43                   ` Andrew Cooper
@ 2014-09-10 10:07                     ` Jan Beulich
  2014-09-10 10:15                       ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-10 10:07 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, Chao Peng, dgdegra

>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
> Actually, on further thought, using multicalls like this cannot possibly
> be correct from a functional point of view.
> 
> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
> there is no guarantee that accesses to remote cpus msrs won't interleave
> with a different natural access, clobbering the results of the wrmsr.
> 
> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
> synchronous thread of execution on the appropriate cpu.  You can trust
> that interrupts won't play with these msrs, but you absolutely can't
> guarantee that IPI/wrmsr/IPI/rdmsr will work.

Not sure I follow, particularly in the context of the white listing of
MSRs permitted here (which ought to not include anything the
hypervisor needs control over).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10 10:07                     ` Jan Beulich
@ 2014-09-10 10:15                       ` Andrew Cooper
  2014-09-10 10:25                         ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-10 10:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, Chao Peng, dgdegra

On 10/09/14 11:07, Jan Beulich wrote:
>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>> Actually, on further thought, using multicalls like this cannot possibly
>> be correct from a functional point of view.
>>
>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>> there is no guarantee that accesses to remote cpus msrs won't interleave
>> with a different natural access, clobbering the results of the wrmsr.
>>
>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>> synchronous thread of execution on the appropriate cpu.  You can trust
>> that interrupts won't play with these msrs, but you absolutely can't
>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
> Not sure I follow, particularly in the context of the white listing of
> MSRs permitted here (which ought to not include anything the
> hypervisor needs control over).
>
> Jan
>

Consider two dom0 vcpus both using this new multicall mechanism to read
QoS information for different domains, which end up both targeting the
same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
may interleave and clobber the first wrmsr.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10 10:15                       ` Andrew Cooper
@ 2014-09-10 10:25                         ` Jan Beulich
  2014-09-10 11:12                           ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-10 10:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, Chao Peng, dgdegra

>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
> On 10/09/14 11:07, Jan Beulich wrote:
>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>> Actually, on further thought, using multicalls like this cannot possibly
>>> be correct from a functional point of view.
>>>
>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>>> there is no guarantee that accesses to remote cpus msrs won't interleave
>>> with a different natural access, clobbering the results of the wrmsr.
>>>
>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>>> synchronous thread of execution on the appropriate cpu.  You can trust
>>> that interrupts won't play with these msrs, but you absolutely can't
>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
>> Not sure I follow, particularly in the context of the white listing of
>> MSRs permitted here (which ought to not include anything the
>> hypervisor needs control over).
> 
> Consider two dom0 vcpus both using this new multicall mechanism to read
> QoS information for different domains, which end up both targeting the
> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
> may interleave and clobber the first wrmsr.

But that situation doesn't result from the multicall use here - it would
equally be the case for an inherently batchable hypercall. To deal with
that we'd need a wrmsr-then-rdmsr operation, or move the entire
execution of the batch onto the target CPU. Since the former would
quickly become unwieldy for more complex operations, I think this
gets us back to aiming at using continue_hypercall_on_cpu() here.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10 10:25                         ` Jan Beulich
@ 2014-09-10 11:12                           ` Andrew Cooper
  2014-09-12  2:55                             ` Chao Peng
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2014-09-10 11:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, Chao Peng, dgdegra

On 10/09/14 11:25, Jan Beulich wrote:
>>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
>> On 10/09/14 11:07, Jan Beulich wrote:
>>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>> Actually, on further thought, using multicalls like this cannot possibly
>>>> be correct from a functional point of view.
>>>>
>>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>>>> there is no guarantee that accesses to remote cpus msrs won't interleave
>>>> with a different natural access, clobbering the results of the wrmsr.
>>>>
>>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>>>> synchronous thread of execution on the appropriate cpu.  You can trust
>>>> that interrupts won't play with these msrs, but you absolutely can't
>>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
>>> Not sure I follow, particularly in the context of the white listing of
>>> MSRs permitted here (which ought to not include anything the
>>> hypervisor needs control over).
>> Consider two dom0 vcpus both using this new multicall mechanism to read
>> QoS information for different domains, which end up both targeting the
>> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
>> may interleave and clobber the first wrmsr.
> But that situation doesn't result from the multicall use here - it would
> equally be the case for an inherently batchable hypercall.

Indeed - I called out multicall because of the current implementation,
but I should have been more clear.

> To deal with
> that we'd need a wrmsr-then-rdmsr operation, or move the entire
> execution of the batch onto the target CPU. Since the former would
> quickly become unwieldy for more complex operations, I think this
> gets us back to aiming at using continue_hypercall_on_cpu() here.

Which gets us back to the problem that you cannot use
copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
in the wrong context.


I think this requires a step back and rethink.  I can't offhand think of
any combination of existing bits of infrastructure which will allow this
to work correctly, which means something new needs designing.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-10 11:12                           ` Andrew Cooper
@ 2014-09-12  2:55                             ` Chao Peng
  2014-09-17  9:22                               ` Chao Peng
  0 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-12  2:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, Jan Beulich, dgdegra

On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote:
> On 10/09/14 11:25, Jan Beulich wrote:
> >>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
> >> On 10/09/14 11:07, Jan Beulich wrote:
> >>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
> >>>> Actually, on further thought, using multicalls like this cannot possibly
> >>>> be correct from a functional point of view.
> >>>>
> >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
> >>>> there is no guarantee that accesses to remote cpus msrs won't interleave
> >>>> with a different natural access, clobbering the results of the wrmsr.
> >>>>
> >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
> >>>> synchronous thread of execution on the appropriate cpu.  You can trust
> >>>> that interrupts won't play with these msrs, but you absolutely can't
> >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
> >>> Not sure I follow, particularly in the context of the white listing of
> >>> MSRs permitted here (which ought to not include anything the
> >>> hypervisor needs control over).
> >> Consider two dom0 vcpus both using this new multicall mechanism to read
> >> QoS information for different domains, which end up both targeting the
> >> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
> >> may interleave and clobber the first wrmsr.
> > But that situation doesn't result from the multicall use here - it would
> > equally be the case for an inherently batchable hypercall.
> 
> Indeed - I called out multicall because of the current implementation,
> but I should have been more clear.
> 
> > To deal with
> > that we'd need a wrmsr-then-rdmsr operation, or move the entire
> > execution of the batch onto the target CPU. Since the former would
> > quickly become unwieldy for more complex operations, I think this
> > gets us back to aiming at using continue_hypercall_on_cpu() here.
> 
> Which gets us back to the problem that you cannot use
> copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
> in the wrong context.
> 
> 
> I think this requires a step back and rethink.  I can't offhand think of
> any combination of existing bits of infrastructure which will allow this
> to work correctly, which means something new needs designing.
> 
How about this:

1)  Still do the batch in do_platform_op() but add a iteration field in
the interface structure.

2)  Still use on_selected_cpus() but group the adjacent resource_ops
which have a same cpu and NO_PREEMPT set into one and do it as a whole
in the new cpu context.

Chao

> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-12  2:55                             ` Chao Peng
@ 2014-09-17  9:22                               ` Chao Peng
  2014-09-17  9:44                                 ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-17  9:22 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

On Fri, Sep 12, 2014 at 10:55:43AM +0800, Chao Peng wrote:
> On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote:
> > On 10/09/14 11:25, Jan Beulich wrote:
> > >>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
> > >> On 10/09/14 11:07, Jan Beulich wrote:
> > >>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
> > >>>> Actually, on further thought, using multicalls like this cannot possibly
> > >>>> be correct from a functional point of view.
> > >>>>
> > >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
> > >>>> there is no guarantee that accesses to remote cpus msrs won't interleave
> > >>>> with a different natural access, clobbering the results of the wrmsr.
> > >>>>
> > >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
> > >>>> synchronous thread of execution on the appropriate cpu.  You can trust
> > >>>> that interrupts won't play with these msrs, but you absolutely can't
> > >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
> > >>> Not sure I follow, particularly in the context of the white listing of
> > >>> MSRs permitted here (which ought to not include anything the
> > >>> hypervisor needs control over).
> > >> Consider two dom0 vcpus both using this new multicall mechanism to read
> > >> QoS information for different domains, which end up both targeting the
> > >> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
> > >> may interleave and clobber the first wrmsr.
> > > But that situation doesn't result from the multicall use here - it would
> > > equally be the case for an inherently batchable hypercall.
> > 
> > Indeed - I called out multicall because of the current implementation,
> > but I should have been more clear.
> > 
> > > To deal with
> > > that we'd need a wrmsr-then-rdmsr operation, or move the entire
> > > execution of the batch onto the target CPU. Since the former would
> > > quickly become unwieldy for more complex operations, I think this
> > > gets us back to aiming at using continue_hypercall_on_cpu() here.
> > 
> > Which gets us back to the problem that you cannot use
> > copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
> > in the wrong context.
> > 
> > 
> > I think this requires a step back and rethink.  I can't offhand think of
> > any combination of existing bits of infrastructure which will allow this
> > to work correctly, which means something new needs designing.
> > 
> How about this:
> 
> 1)  Still do the batch in do_platform_op() but add a iteration field in
> the interface structure.
> 
> 2)  Still use on_selected_cpus() but group the adjacent resource_ops
> which have a same cpu and NO_PREEMPT set into one and do it as a whole
> in the new cpu context.
> 
Any suggestion for this?
> Chao
> 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-17  9:22                               ` Chao Peng
@ 2014-09-17  9:44                                 ` Jan Beulich
  2014-09-18 13:45                                   ` Chao Peng
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2014-09-17  9:44 UTC (permalink / raw)
  To: Andrew Cooper, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

>>> On 17.09.14 at 11:22, <chao.p.peng@linux.intel.com> wrote:
> On Fri, Sep 12, 2014 at 10:55:43AM +0800, Chao Peng wrote:
>> On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote:
>> > On 10/09/14 11:25, Jan Beulich wrote:
>> > >>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
>> > >> On 10/09/14 11:07, Jan Beulich wrote:
>> > >>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>> > >>>> Actually, on further thought, using multicalls like this cannot possibly
>> > >>>> be correct from a functional point of view.
>> > >>>>
>> > >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>> > >>>> there is no guarantee that accesses to remote cpus msrs won't interleave
>> > >>>> with a different natural access, clobbering the results of the wrmsr.
>> > >>>>
>> > >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>> > >>>> synchronous thread of execution on the appropriate cpu.  You can trust
>> > >>>> that interrupts won't play with these msrs, but you absolutely can't
>> > >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
>> > >>> Not sure I follow, particularly in the context of the white listing of
>> > >>> MSRs permitted here (which ought to not include anything the
>> > >>> hypervisor needs control over).
>> > >> Consider two dom0 vcpus both using this new multicall mechanism to read
>> > >> QoS information for different domains, which end up both targeting the
>> > >> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
>> > >> may interleave and clobber the first wrmsr.
>> > > But that situation doesn't result from the multicall use here - it would
>> > > equally be the case for an inherently batchable hypercall.
>> > 
>> > Indeed - I called out multicall because of the current implementation,
>> > but I should have been more clear.
>> > 
>> > > To deal with
>> > > that we'd need a wrmsr-then-rdmsr operation, or move the entire
>> > > execution of the batch onto the target CPU. Since the former would
>> > > quickly become unwieldy for more complex operations, I think this
>> > > gets us back to aiming at using continue_hypercall_on_cpu() here.
>> > 
>> > Which gets us back to the problem that you cannot use
>> > copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
>> > in the wrong context.
>> > 
>> > 
>> > I think this requires a step back and rethink.  I can't offhand think of
>> > any combination of existing bits of infrastructure which will allow this
>> > to work correctly, which means something new needs designing.
>> > 
>> How about this:
>> 
>> 1)  Still do the batch in do_platform_op() but add a iteration field in
>> the interface structure.
>> 
>> 2)  Still use on_selected_cpus() but group the adjacent resource_ops
>> which have a same cpu and NO_PREEMPT set into one and do it as a whole
>> in the new cpu context.
>> 
> Any suggestion for this?

1 is ugly (contradicting everything we do elsewhere), but would be a
last resort option.

2 would be perhaps an option if small, non-preemptible batches
would be handled in do_platform_op() while preemptible larger
groups then ought to use the multicall interface.

Option 3 would be to fiddle with the current vCPU's affinity before
invoking a continuation (perhaps already on the first iteration to
get onto the needed pCPU).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-17  9:44                                 ` Jan Beulich
@ 2014-09-18 13:45                                   ` Chao Peng
  2014-09-18 14:22                                     ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Chao Peng @ 2014-09-18 13:45 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Ian.Jackson, xen-devel, dgdegra

On Wed, Sep 17, 2014 at 10:44:12AM +0100, Jan Beulich wrote:
> >>> On 17.09.14 at 11:22, <chao.p.peng@linux.intel.com> wrote:
> > On Fri, Sep 12, 2014 at 10:55:43AM +0800, Chao Peng wrote:
> >> On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote:
> >> > On 10/09/14 11:25, Jan Beulich wrote:
> >> > >>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
> >> > >> On 10/09/14 11:07, Jan Beulich wrote:
> >> > >>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
> >> > >>>> Actually, on further thought, using multicalls like this cannot possibly
> >> > >>>> be correct from a functional point of view.
> >> > >>>>
> >> > >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
> >> > >>>> there is no guarantee that accesses to remote cpus msrs won't interleave
> >> > >>>> with a different natural access, clobbering the results of the wrmsr.
> >> > >>>>
> >> > >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
> >> > >>>> synchronous thread of execution on the appropriate cpu.  You can trust
> >> > >>>> that interrupts won't play with these msrs, but you absolutely can't
> >> > >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
> >> > >>> Not sure I follow, particularly in the context of the white listing of
> >> > >>> MSRs permitted here (which ought to not include anything the
> >> > >>> hypervisor needs control over).
> >> > >> Consider two dom0 vcpus both using this new multicall mechanism to read
> >> > >> QoS information for different domains, which end up both targeting the
> >> > >> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
> >> > >> may interleave and clobber the first wrmsr.
> >> > > But that situation doesn't result from the multicall use here - it would
> >> > > equally be the case for an inherently batchable hypercall.
> >> > 
> >> > Indeed - I called out multicall because of the current implementation,
> >> > but I should have been more clear.
> >> > 
> >> > > To deal with
> >> > > that we'd need a wrmsr-then-rdmsr operation, or move the entire
> >> > > execution of the batch onto the target CPU. Since the former would
> >> > > quickly become unwieldy for more complex operations, I think this
> >> > > gets us back to aiming at using continue_hypercall_on_cpu() here.
> >> > 
> >> > Which gets us back to the problem that you cannot use
> >> > copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
> >> > in the wrong context.
> >> > 
> >> > 
> >> > I think this requires a step back and rethink.  I can't offhand think of
> >> > any combination of existing bits of infrastructure which will allow this
> >> > to work correctly, which means something new needs designing.
> >> > 
> >> How about this:
> >> 
> >> 1)  Still do the batch in do_platform_op() but add a iteration field in
> >> the interface structure.
> >> 
> >> 2)  Still use on_selected_cpus() but group the adjacent resource_ops
> >> which have a same cpu and NO_PREEMPT set into one and do it as a whole
> >> in the new cpu context.
> >> 
> > Any suggestion for this?
> 
> 1 is ugly (contradicting everything we do elsewhere), but would be a
> last resort option.
> 
> 2 would be perhaps an option if small, non-preemptible batches
> would be handled in do_platform_op() while preemptible larger
> groups then ought to use the multicall interface.
> 
> Option 3 would be to fiddle with the current vCPU's affinity before
> invoking a continuation (perhaps already on the first iteration to
> get onto the needed pCPU).
> 
Thanks Jan.

On further thought, I think we may over design for this.

Why not make it simple and also scalable?
The answer is also simple: do_platform_op() is always non-preemptible.

It can accept one operation or small batch of operations but it
guarantees all the operations are non-preemptible. (eg it never calls
hypercall_create_continuation() )
It's the minimum unit for non-preemptible operation.

If the caller(userspace tool) wants to make preemptible batch calls,
then multicall mechanism can be employed. 
We don't need to add NO_PREEMPT ability for multicall. Just keep it
preemptible.

This is almost option 2 above.

Chao
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls
  2014-09-18 13:45                                   ` Chao Peng
@ 2014-09-18 14:22                                     ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2014-09-18 14:22 UTC (permalink / raw)
  To: Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Andrew Cooper, Ian.Jackson, xen-devel, dgdegra

>>> On 18.09.14 at 15:45, <chao.p.peng@linux.intel.com> wrote:
> On Wed, Sep 17, 2014 at 10:44:12AM +0100, Jan Beulich wrote:
>> >>> On 17.09.14 at 11:22, <chao.p.peng@linux.intel.com> wrote:
>> > On Fri, Sep 12, 2014 at 10:55:43AM +0800, Chao Peng wrote:
>> >> On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote:
>> >> > On 10/09/14 11:25, Jan Beulich wrote:
>> >> > >>>> On 10.09.14 at 12:15, <andrew.cooper3@citrix.com> wrote:
>> >> > >> On 10/09/14 11:07, Jan Beulich wrote:
>> >> > >>>>>> On 10.09.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>> >> > >>>> Actually, on further thought, using multicalls like this cannot possibly
>> >> > >>>> be correct from a functional point of view.
>> >> > >>>>
>> >> > >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair,
>> >> > >>>> there is no guarantee that accesses to remote cpus msrs won't interleave
>> >> > >>>> with a different natural access, clobbering the results of the wrmsr.
>> >> > >>>>
>> >> > >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same
>> >> > >>>> synchronous thread of execution on the appropriate cpu.  You can trust
>> >> > >>>> that interrupts won't play with these msrs, but you absolutely can't
>> >> > >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work.
>> >> > >>> Not sure I follow, particularly in the context of the white listing of
>> >> > >>> MSRs permitted here (which ought to not include anything the
>> >> > >>> hypervisor needs control over).
>> >> > >> Consider two dom0 vcpus both using this new multicall mechanism to read
>> >> > >> QoS information for different domains, which end up both targeting the
>> >> > >> same remote cpu.  They will both end up using IPI/wrmsr/IPI/rdmsr, which
>> >> > >> may interleave and clobber the first wrmsr.
>> >> > > But that situation doesn't result from the multicall use here - it would
>> >> > > equally be the case for an inherently batchable hypercall.
>> >> > 
>> >> > Indeed - I called out multicall because of the current implementation,
>> >> > but I should have been more clear.
>> >> > 
>> >> > > To deal with
>> >> > > that we'd need a wrmsr-then-rdmsr operation, or move the entire
>> >> > > execution of the batch onto the target CPU. Since the former would
>> >> > > quickly become unwieldy for more complex operations, I think this
>> >> > > gets us back to aiming at using continue_hypercall_on_cpu() here.
>> >> > 
>> >> > Which gets us back to the problem that you cannot use
>> >> > copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being
>> >> > in the wrong context.
>> >> > 
>> >> > 
>> >> > I think this requires a step back and rethink.  I can't offhand think of
>> >> > any combination of existing bits of infrastructure which will allow this
>> >> > to work correctly, which means something new needs designing.
>> >> > 
>> >> How about this:
>> >> 
>> >> 1)  Still do the batch in do_platform_op() but add a iteration field in
>> >> the interface structure.
>> >> 
>> >> 2)  Still use on_selected_cpus() but group the adjacent resource_ops
>> >> which have a same cpu and NO_PREEMPT set into one and do it as a whole
>> >> in the new cpu context.
>> >> 
>> > Any suggestion for this?
>> 
>> 1 is ugly (contradicting everything we do elsewhere), but would be a
>> last resort option.
>> 
>> 2 would be perhaps an option if small, non-preemptible batches
>> would be handled in do_platform_op() while preemptible larger
>> groups then ought to use the multicall interface.
>> 
>> Option 3 would be to fiddle with the current vCPU's affinity before
>> invoking a continuation (perhaps already on the first iteration to
>> get onto the needed pCPU).
>> 
> Thanks Jan.
> 
> On further thought, I think we may over design for this.
> 
> Why not make it simple and also scalable?
> The answer is also simple: do_platform_op() is always non-preemptible.
> 
> It can accept one operation or small batch of operations but it
> guarantees all the operations are non-preemptible. (eg it never calls
> hypercall_create_continuation() )
> It's the minimum unit for non-preemptible operation.
> 
> If the caller(userspace tool) wants to make preemptible batch calls,
> then multicall mechanism can be employed. 
> We don't need to add NO_PREEMPT ability for multicall. Just keep it
> preemptible.
> 
> This is almost option 2 above.

Right, this is what I described for option 2 above.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-10  2:55     ` Chao Peng
@ 2014-09-29 18:52       ` Konrad Rzeszutek Wilk
  2014-09-30  7:45         ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-29 18:52 UTC (permalink / raw)
  To: Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Andrew Cooper, Ian.Jackson, xen-devel, JBeulich, dgdegra

On Wed, Sep 10, 2014 at 10:55:12AM +0800, Chao Peng wrote:
> On Fri, Sep 05, 2014 at 11:59:11AM +0100, Andrew Cooper wrote:
> > On 05/09/14 09:37, Chao Peng wrote:
> > > Add a generic resource access hypercall for tool stack or other
> > > components, e.g., accessing MSR, port I/O, etc.
> > >
> > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > > ---
> > >  xen/arch/x86/platform_hypercall.c |   63 +++++++++++++++++++++++++++++++++++++
> > >  xen/include/public/platform.h     |   15 +++++++++
> > >  xen/include/xlat.lst              |    1 +
> > >  3 files changed, 79 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> > > index 2162811..e5ad4c9 100644
> > > --- a/xen/arch/x86/platform_hypercall.c
> > > +++ b/xen/arch/x86/platform_hypercall.c
> > > @@ -61,6 +61,42 @@ long cpu_down_helper(void *data);
> > >  long core_parking_helper(void *data);
> > >  uint32_t get_cur_idle_nums(void);
> > >  
> > > +struct xen_resource_access {
> > > +    int32_t ret;
> > > +    struct xenpf_resource_op *data;
> > > +};
> > > +
> > > +static bool_t allow_access_msr(unsigned int msr)
> > > +{
> > > +    return 0;
> > > +}
> > > +
> > > +static void resource_access_one(void *info)
> > > +{
> > > +    struct xen_resource_access *ra = info;
> > > +    int ret = 0;
> > > +
> > > +    switch ( ra->data->cmd )
> > > +    {
> > > +    case XEN_RESOURCE_OP_MSR_READ:
> > > +    case XEN_RESOURCE_OP_MSR_WRITE:
> > > +        if ( ra->data->idx >> 32 )
> > > +            ret = -EINVAL;
> > > +        else if ( !allow_access_msr(ra->data->idx) )
> > > +            ret = -EACCES;
> > > +        else if ( ra->data->cmd == XEN_RESOURCE_OP_MSR_READ )
> > > +            ret = rdmsr_safe(ra->data->idx, ra->data->val);
> > > +        else
> > > +            ret = wrmsr_safe(ra->data->idx, ra->data->val);
> > > +        break;
> > > +    default:
> > > +        ret = -EINVAL;
> > > +        break;
> > > +    }
> > > +
> > > +    ra->ret = ret;
> > > +}
> > > +
> > >  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
> > >  {
> > >      ret_t ret = 0;
> > > @@ -601,6 +637,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
> > >      }
> > >      break;
> > >  
> > > +    case XENPF_resource_op:
> > > +    {
> > > +        struct xen_resource_access ra;
> > > +        struct xenpf_resource_op *rsc_op = &op->u.resource_op;
> > > +        unsigned int cpu = smp_processor_id();
> > > +
> > > +        ra.data = rsc_op;
> > > +
> > > +        if ( rsc_op->cpu == cpu )
> > > +            resource_access_one(&ra);
> > > +        else if ( cpu_online(rsc_op->cpu) )
> > > +            on_selected_cpus(cpumask_of(rsc_op->cpu),
> > 
> > You must validate rsc_op->cpu before using it.  cpumask_of(something
> > large) will happily wander off the end of an array.
> cpu_online() should detect this.

Why would it? It just looks an array and checks to see if the bit is
set. (If you look at the ASSERT in 'cpumask_check' - the assert is not
part of non-debug build).

> > 
> > > +                         resource_access_one, &ra, 1);
> > > +        else
> > > +        {
> > > +            ret = -ENODEV;
> > > +            break;
> > > +        }
> > > +
> > > +        if ( ra.ret )
> > > +        {
> > > +            ret = ra.ret;
> > > +            break;
> > > +        }
> > > +    }
> > > +    break;
> > > +
> > >      default:
> > >          ret = -ENOSYS;
> > >          break;
> > > diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> > > index 053b9fa..3ed50de 100644
> > > --- a/xen/include/public/platform.h
> > > +++ b/xen/include/public/platform.h
> > > @@ -527,6 +527,20 @@ struct xenpf_core_parking {
> > >  typedef struct xenpf_core_parking xenpf_core_parking_t;
> > >  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
> > >  
> > > +#define XENPF_resource_op   61
> > > +
> > > +#define XEN_RESOURCE_OP_MSR_READ  0
> > > +#define XEN_RESOURCE_OP_MSR_WRITE 1
> > > +
> > > +struct xenpf_resource_op {
> > > +    uint16_t cmd;       /* XEN_RESOURCE_OP_* */
> > 
> > There is an alignment issue here between 32 and 64bit. Putting an an
> > explicit unit16_t _resd; field should fix it.
> > 
> > ~Andrew
> OK, thanks.

You can also use 'pahole' on the Xen binary to figure these issues out.

> > 
> > > +    uint32_t cpu;
> > > +    uint64_t idx;
> > > +    uint64_t val;
> > > +};
> > > +typedef struct xenpf_resource_op xenpf_resource_op_t;
> > > +DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
> > > +
> > >  /*
> > >   * ` enum neg_errnoval
> > >   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
> > > @@ -553,6 +567,7 @@ struct xen_platform_op {
> > >          struct xenpf_cpu_hotadd        cpu_add;
> > >          struct xenpf_mem_hotadd        mem_add;
> > >          struct xenpf_core_parking      core_parking;
> > > +        struct xenpf_resource_op       resource_op;
> > >          uint8_t                        pad[128];
> > >      } u;
> > >  };
> > > diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> > > index 9a35dd7..06ed7b9 100644
> > > --- a/xen/include/xlat.lst
> > > +++ b/xen/include/xlat.lst
> > > @@ -88,6 +88,7 @@
> > >  ?	xenpf_enter_acpi_sleep		platform.h
> > >  ?	xenpf_pcpuinfo			platform.h
> > >  ?	xenpf_pcpu_version		platform.h
> > > +?	xenpf_resource_op		platform.h
> > >  !	sched_poll			sched.h
> > >  ?	sched_remote_shutdown		sched.h
> > >  ?	sched_shutdown			sched.h
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall
  2014-09-29 18:52       ` Konrad Rzeszutek Wilk
@ 2014-09-30  7:45         ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2014-09-30  7:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Chao Peng
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	Andrew Cooper, Ian.Jackson, xen-devel, dgdegra

>>> On 29.09.14 at 20:52, <konrad@darnok.org> wrote:
> On Wed, Sep 10, 2014 at 10:55:12AM +0800, Chao Peng wrote:
>> On Fri, Sep 05, 2014 at 11:59:11AM +0100, Andrew Cooper wrote:
>> > On 05/09/14 09:37, Chao Peng wrote:
>> > > Add a generic resource access hypercall for tool stack or other
>> > > components, e.g., accessing MSR, port I/O, etc.
>> > >
>> > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
>> > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> > > ---
>> > >  xen/arch/x86/platform_hypercall.c |   63 
> +++++++++++++++++++++++++++++++++++++
>> > >  xen/include/public/platform.h     |   15 +++++++++
>> > >  xen/include/xlat.lst              |    1 +
>> > >  3 files changed, 79 insertions(+)
>> > >
>> > > diff --git a/xen/arch/x86/platform_hypercall.c 
> b/xen/arch/x86/platform_hypercall.c
>> > > index 2162811..e5ad4c9 100644
>> > > --- a/xen/arch/x86/platform_hypercall.c
>> > > +++ b/xen/arch/x86/platform_hypercall.c
>> > > @@ -61,6 +61,42 @@ long cpu_down_helper(void *data);
>> > >  long core_parking_helper(void *data);
>> > >  uint32_t get_cur_idle_nums(void);
>> > >  
>> > > +struct xen_resource_access {
>> > > +    int32_t ret;
>> > > +    struct xenpf_resource_op *data;
>> > > +};
>> > > +
>> > > +static bool_t allow_access_msr(unsigned int msr)
>> > > +{
>> > > +    return 0;
>> > > +}
>> > > +
>> > > +static void resource_access_one(void *info)
>> > > +{
>> > > +    struct xen_resource_access *ra = info;
>> > > +    int ret = 0;
>> > > +
>> > > +    switch ( ra->data->cmd )
>> > > +    {
>> > > +    case XEN_RESOURCE_OP_MSR_READ:
>> > > +    case XEN_RESOURCE_OP_MSR_WRITE:
>> > > +        if ( ra->data->idx >> 32 )
>> > > +            ret = -EINVAL;
>> > > +        else if ( !allow_access_msr(ra->data->idx) )
>> > > +            ret = -EACCES;
>> > > +        else if ( ra->data->cmd == XEN_RESOURCE_OP_MSR_READ )
>> > > +            ret = rdmsr_safe(ra->data->idx, ra->data->val);
>> > > +        else
>> > > +            ret = wrmsr_safe(ra->data->idx, ra->data->val);
>> > > +        break;
>> > > +    default:
>> > > +        ret = -EINVAL;
>> > > +        break;
>> > > +    }
>> > > +
>> > > +    ra->ret = ret;
>> > > +}
>> > > +
>> > >  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) 
> u_xenpf_op)
>> > >  {
>> > >      ret_t ret = 0;
>> > > @@ -601,6 +637,33 @@ ret_t 
> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>> > >      }
>> > >      break;
>> > >  
>> > > +    case XENPF_resource_op:
>> > > +    {
>> > > +        struct xen_resource_access ra;
>> > > +        struct xenpf_resource_op *rsc_op = &op->u.resource_op;
>> > > +        unsigned int cpu = smp_processor_id();
>> > > +
>> > > +        ra.data = rsc_op;
>> > > +
>> > > +        if ( rsc_op->cpu == cpu )
>> > > +            resource_access_one(&ra);
>> > > +        else if ( cpu_online(rsc_op->cpu) )
>> > > +            on_selected_cpus(cpumask_of(rsc_op->cpu),
>> > 
>> > You must validate rsc_op->cpu before using it.  cpumask_of(something
>> > large) will happily wander off the end of an array.
>> cpu_online() should detect this.
> 
> Why would it? It just looks an array and checks to see if the bit is
> set. (If you look at the ASSERT in 'cpumask_check' - the assert is not
> part of non-debug build).

Indeed - this needs to be a range check followed by cpu_online().

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-09-30  7:45 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-05  8:37 [PATCH v15 00/11] enable Cache QoS Monitoring (CQM) feature Chao Peng
2014-09-05  8:37 ` [PATCH v15 01/11] multicall: add no preemption ability between two calls Chao Peng
2014-09-05 10:46   ` Andrew Cooper
2014-09-09  6:43     ` Chao Peng
2014-09-09 10:39       ` Jan Beulich
2014-09-09 10:51         ` Andrew Cooper
2014-09-09 11:51           ` Jan Beulich
2014-09-09 12:44             ` Andrew Cooper
2014-09-09 13:15               ` Jan Beulich
2014-09-10  1:32                 ` Chao Peng
2014-09-10  9:43                   ` Andrew Cooper
2014-09-10 10:07                     ` Jan Beulich
2014-09-10 10:15                       ` Andrew Cooper
2014-09-10 10:25                         ` Jan Beulich
2014-09-10 11:12                           ` Andrew Cooper
2014-09-12  2:55                             ` Chao Peng
2014-09-17  9:22                               ` Chao Peng
2014-09-17  9:44                                 ` Jan Beulich
2014-09-18 13:45                                   ` Chao Peng
2014-09-18 14:22                                     ` Jan Beulich
2014-09-05  8:37 ` [PATCH v15 02/11] x86: add generic resource (e.g. MSR) access hypercall Chao Peng
2014-09-05 10:59   ` Andrew Cooper
2014-09-05 11:49     ` Jan Beulich
2014-09-10  2:55     ` Chao Peng
2014-09-29 18:52       ` Konrad Rzeszutek Wilk
2014-09-30  7:45         ` Jan Beulich
2014-09-05  8:37 ` [PATCH v15 03/11] xsm: add resource operation related xsm policy Chao Peng
2014-09-05  8:37 ` [PATCH v15 04/11] tools: provide interface for generic resource access Chao Peng
2014-09-05  8:37 ` [PATCH v15 05/11] x86: detect and initialize Platform QoS Monitoring feature Chao Peng
2014-09-05 11:05   ` Andrew Cooper
2014-09-05  8:37 ` [PATCH v15 06/11] x86: dynamically attach/detach QoS monitoring service for a guest Chao Peng
2014-09-05  8:37 ` [PATCH v15 07/11] x86: collect global QoS monitoring information Chao Peng
2014-09-05  8:37 ` [PATCH v15 08/11] x86: enable QoS monitoring for each domain RMID Chao Peng
2014-09-05  8:37 ` [PATCH v15 09/11] x86: add QoS monitoring related MSRs in allowed list Chao Peng
2014-09-05  8:37 ` [PATCH v15 10/11] xsm: add platform QoS related xsm policies Chao Peng
2014-09-05  8:37 ` [PATCH v15 11/11] tools: CMDs and APIs for Platform QoS Monitoring Chao Peng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).