All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs
@ 2015-01-20  9:45 Chao Peng
  2015-01-20  9:45 ` [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

Changes from v3:
* Get timestamp information from host along with the monitoring counter;
  This is required for counter overlow detection.
* Address comments from Wei on the last patch.

Changes from v2:
* Remove the usage of "static" to cache data in xc;
  NOTE: Other places that already existed before are not touched due to
        the needs for API change. Will fix in separate patch if desirable.
* Coding style;

Changes from v1:
* Move event type check from xc to xl;
* Add retry capability for MBM sampling;
* Fix Coding style/docs;

Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature
which builds on the CMT infrastructure to allow monitoring of system
memory bandwidth. Event codes are provided to monitor both "total"
and "local" bandwidth, meaning bandwidth over QPI and other external
links can be monitored.

For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its
dependency on CMT, the software also makes use of most of CMT codes.
Actually, besides introducing two additional events and some cpuid
feature bits, there are no extra changes compared to cache occupancy
monitoring in CMT. Due to this, CMT should be enabled first to use
this feature.

For interface changes, the patch serial introduces a new command
"XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature
capability to user space and modified "resource_op" to support reading
host timestamp.
Tool stack side, two additional options introduced for "xl psr-cmt-show":
total_mem_bandwidth:     Show total memory bandwidth
local_mem_bandwidth:     Show local memory bandwidth

The usage flow keeps the same with CMT.

Chao Peng (5):
  x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  tools: add routine to get CMT L3 event mask
  tools: correct coding style for psr
  tools: code refactoring for MBM
  tools: add total/local memory bandwith monitoring

 docs/man/xl.pod.1                 |   9 +++
 tools/libxc/include/xenctrl.h     |  14 ++--
 tools/libxc/xc_msr_x86.h          |   1 +
 tools/libxc/xc_psr.c              |  59 ++++++++++++---
 tools/libxl/libxl.h               |  22 +++++-
 tools/libxl/libxl_psr.c           | 155 ++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_types.idl       |   2 +
 tools/libxl/xl_cmdimpl.c          |  78 +++++++++++++------
 tools/libxl/xl_cmdtable.c         |   4 +-
 xen/arch/x86/platform_hypercall.c |   8 +-
 10 files changed, 295 insertions(+), 57 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
@ 2015-01-20  9:45 ` Chao Peng
  2015-01-20 10:02   ` Jan Beulich
  2015-01-20  9:45 ` [PATCH v4 2/5] tools: add routine to get CMT L3 event mask Chao Peng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

Memory bandwidth monitoring requires timestamp returned along with the
monitoring counter to verify the correctness of the counter value.
Allow MSR_IA32_TSC to be read and increase RESOURCE_ACCESS_MAX_ENTRIES
to 3.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 xen/arch/x86/platform_hypercall.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 32f39b2..e80b857 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -61,14 +61,14 @@ long cpu_down_helper(void *data);
 long core_parking_helper(void *data);
 uint32_t get_cur_idle_nums(void);
 
-#define RESOURCE_ACCESS_MAX_ENTRIES 2
+#define RESOURCE_ACCESS_MAX_ENTRIES 3
 struct xen_resource_access {
     unsigned int nr_done;
     unsigned int nr_entries;
     xenpf_resource_entry_t *entries;
 };
 
-static bool_t allow_access_msr(unsigned int msr)
+static bool_t allow_access_msr(unsigned int msr, unsigned int cmd)
 {
     switch ( msr )
     {
@@ -76,6 +76,8 @@ static bool_t allow_access_msr(unsigned int msr)
     case MSR_IA32_CMT_EVTSEL:
     case MSR_IA32_CMT_CTR:
         return 1;
+    case MSR_IA32_TSC:
+        return cmd == XEN_RESOURCE_OP_MSR_READ;
     }
 
     return 0;
@@ -102,7 +104,7 @@ static void check_resource_access(struct xen_resource_access *ra)
         case XEN_RESOURCE_OP_MSR_WRITE:
             if ( entry->idx >> 32 )
                 ret = -EINVAL;
-            else if ( !allow_access_msr(entry->idx) )
+            else if ( !allow_access_msr(entry->idx, entry->u.cmd) )
                 ret = -EACCES;
             break;
         default:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/5] tools: add routine to get CMT L3 event mask
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
  2015-01-20  9:45 ` [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
@ 2015-01-20  9:45 ` Chao Peng
  2015-01-20  9:45 ` [PATCH v4 3/5] tools: correct coding style for psr Chao Peng
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask
of XEN_SYSCTL_psr_cmt_op.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 tools/libxc/include/xenctrl.h |  1 +
 tools/libxc/xc_psr.c          | 17 +++++++++++++++++
 tools/libxl/libxl.h           |  1 +
 tools/libxl/libxl_psr.c       | 15 +++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..96b357c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2697,6 +2697,7 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
     uint32_t *upscaling_factor);
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
     uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index 872e6dc..ac19fe4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -112,6 +112,23 @@ int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
     return rc;
 }
 
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask)
+{
+    int rc;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_psr_cmt_op;
+    sysctl.u.psr_cmt_op.cmd =
+        XEN_SYSCTL_PSR_CMT_get_l3_event_mask;
+    sysctl.u.psr_cmt_op.flags = 0;
+
+    rc = xc_sysctl(xch, &sysctl);
+    if ( !rc )
+        *event_mask = sysctl.u.psr_cmt_op.u.data;
+
+    return rc;
+}
+
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
                                       uint32_t *l3_cache_size)
 {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..c9a64f9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1456,6 +1456,7 @@ int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
 int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
     uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
     uint32_t socketid, uint32_t *l3_cache_occupancy);
 #endif
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 0437465..07f2aee 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -160,6 +160,21 @@ out:
     return rc;
 }
 
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = xc_psr_cmt_get_l3_event_mask(ctx->xch, event_mask);
+    if (rc < 0) {
+        libxl__psr_cmt_log_err_msg(gc, errno);
+        rc = ERROR_FAIL;
+    }
+
+    GC_FREE;
+    return rc;
+}
+
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
     uint32_t socketid, uint32_t *l3_cache_occupancy)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/5] tools: correct coding style for psr
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
  2015-01-20  9:45 ` [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
  2015-01-20  9:45 ` [PATCH v4 2/5] tools: add routine to get CMT L3 event mask Chao Peng
@ 2015-01-20  9:45 ` Chao Peng
  2015-01-20  9:45 ` [PATCH v4 4/5] tools: code refactoring for MBM Chao Peng
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

- space: remove space after '(' or before ')' in 'if' condition;
- indention: align function definition/call arguments;

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenctrl.h | 10 +++++-----
 tools/libxc/xc_psr.c          | 10 +++++-----
 tools/libxl/libxl.h           | 11 +++++++----
 tools/libxl/libxl_psr.c       | 11 +++++++----
 tools/libxl/xl_cmdimpl.c      | 11 ++++++-----
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 96b357c..8a0eab6 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2693,15 +2693,15 @@ typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-    uint32_t *rmid);
+                               uint32_t *rmid);
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-    uint32_t *upscaling_factor);
+                                       uint32_t *upscaling_factor);
 int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
-    uint32_t *l3_cache_size);
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-    uint32_t cpu, uint32_t psr_cmt_type, uint64_t *monitor_data);
+                                 uint32_t *l3_cache_size);
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+                        uint32_t psr_cmt_type, uint64_t *monitor_data);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index ac19fe4..a9881a4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -47,7 +47,7 @@ int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid)
 }
 
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-                                    uint32_t *rmid)
+                               uint32_t *rmid)
 {
     int rc;
     DECLARE_DOMCTL;
@@ -88,7 +88,7 @@ int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid)
 }
 
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-                                            uint32_t *upscaling_factor)
+                                       uint32_t *upscaling_factor)
 {
     static int val = 0;
     int rc;
@@ -130,7 +130,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask)
 }
 
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
-                                      uint32_t *l3_cache_size)
+                                 uint32_t *l3_cache_size)
 {
     static int val = 0;
     int rc;
@@ -155,8 +155,8 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
     return rc;
 }
 
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-    uint32_t cpu, xc_psr_cmt_type type, uint64_t *monitor_data)
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+                        xc_psr_cmt_type type, uint64_t *monitor_data)
 {
     xc_resource_op_t op;
     xc_resource_entry_t entries[2];
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c9a64f9..596d2a0 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1454,11 +1454,14 @@ int libxl_psr_cmt_detach(libxl_ctx *ctx, uint32_t domid);
 int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid);
 int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
-    uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+                                    uint32_t socketid,
+                                    uint32_t *l3_cache_size);
 int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
-    uint32_t socketid, uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 07f2aee..84819e6 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -135,8 +135,9 @@ int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid)
     return rc;
 }
 
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
-                                         uint32_t *l3_cache_size)
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+                                    uint32_t socketid,
+                                    uint32_t *l3_cache_size)
 {
     GC_INIT(ctx);
 
@@ -175,8 +176,10 @@ int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
     return rc;
 }
 
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
-    uint32_t socketid, uint32_t *l3_cache_occupancy)
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy)
 {
     GC_INIT(ctx);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 0b02a6c..8fce979 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7808,7 +7808,7 @@ out:
 
 #ifdef LIBXL_HAVE_PSR_CMT
 static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
-                                                    uint32_t nr_sockets)
+                                                 uint32_t nr_sockets)
 {
     char *domain_name;
     uint32_t socketid;
@@ -7822,8 +7822,8 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
     free(domain_name);
 
     for (socketid = 0; socketid < nr_sockets; socketid++) {
-        if ( !libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid,
-                 socketid, &l3_cache_occupancy) )
+        if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid,
+                                               &l3_cache_occupancy))
             printf("%13u KB", l3_cache_occupancy);
     }
 
@@ -7871,8 +7871,9 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
     for (socketid = 0; socketid < nr_sockets; socketid++) {
         rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size);
         if (rc < 0) {
-            fprintf(stderr, "Failed to get system l3 cache size for socket:%d\n",
-                            socketid);
+            fprintf(stderr,
+                    "Failed to get system l3 cache size for socket:%d\n",
+                    socketid);
             return -1;
         }
         printf("%13u KB", l3_cache_size);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/5] tools: code refactoring for MBM
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
                   ` (2 preceding siblings ...)
  2015-01-20  9:45 ` [PATCH v4 3/5] tools: correct coding style for psr Chao Peng
@ 2015-01-20  9:45 ` Chao Peng
  2015-01-20  9:45 ` [PATCH v4 5/5] tools: add total/local memory bandwith monitoring Chao Peng
  2015-01-20 13:49 ` [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Andrew Cooper
  5 siblings, 0 replies; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

Make some internal routines common so that total/local memory bandwidth
monitoring in the next patch can make use of them.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_psr.c  | 44 +++++++++++++++++++++++++--------------
 tools/libxl/xl_cmdimpl.c | 54 ++++++++++++++++++++++++++++--------------------
 2 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 84819e6..c88c421 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -176,20 +176,16 @@ int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
     return rc;
 }
 
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
-                                      uint32_t domid,
-                                      uint32_t socketid,
-                                      uint32_t *l3_cache_occupancy)
+static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
+                                                 uint32_t domid,
+                                                 xc_psr_cmt_type type,
+                                                 uint32_t socketid,
+                                                 uint64_t *data)
 {
-    GC_INIT(ctx);
-
     unsigned int rmid;
-    uint32_t upscaling_factor;
-    uint64_t monitor_data;
     int cpu, rc;
-    xc_psr_cmt_type type;
 
-    rc = xc_psr_cmt_get_domain_rmid(ctx->xch, domid, &rmid);
+    rc = xc_psr_cmt_get_domain_rmid(CTX->xch, domid, &rmid);
     if (rc < 0 || rmid == 0) {
         LOGE(ERROR, "fail to get the domain rmid, "
             "or domain is not attached with platform QoS monitoring service");
@@ -204,14 +200,32 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
         goto out;
     }
 
-    type = XC_PSR_CMT_L3_OCCUPANCY;
-    rc = xc_psr_cmt_get_data(ctx->xch, rmid, cpu, type, &monitor_data);
+    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data);
     if (rc < 0) {
         LOGE(ERROR, "failed to get monitoring data");
         rc = ERROR_FAIL;
-        goto out;
     }
 
+out:
+    return rc;
+}
+
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+                                      uint32_t domid,
+                                      uint32_t socketid,
+                                      uint32_t *l3_cache_occupancy)
+{
+    GC_INIT(ctx);
+    uint64_t data;
+    uint32_t upscaling_factor;
+    int rc;
+
+    rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
+                                               XC_PSR_CMT_L3_OCCUPANCY,
+                                               socketid, &data);
+    if (rc < 0)
+            goto out;
+
     rc = xc_psr_cmt_get_l3_upscaling_factor(ctx->xch, &upscaling_factor);
     if (rc < 0) {
         LOGE(ERROR, "failed to get L3 upscaling factor");
@@ -219,8 +233,8 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
         goto out;
     }
 
-    *l3_cache_occupancy = upscaling_factor * monitor_data / 1024;
-    rc = 0;
+    *l3_cache_occupancy = upscaling_factor * data / 1024;
+
 out:
     GC_FREE;
     return rc;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8fce979..1827c63 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7807,12 +7807,13 @@ out:
 }
 
 #ifdef LIBXL_HAVE_PSR_CMT
-static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
-                                                 uint32_t nr_sockets)
+static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
+                                         libxl_psr_cmt_type type,
+                                         uint32_t nr_sockets)
 {
     char *domain_name;
     uint32_t socketid;
-    uint32_t l3_cache_occupancy;
+    uint32_t data;
 
     if (!libxl_psr_cmt_domain_attached(ctx, dominfo->domid))
         return;
@@ -7822,15 +7823,21 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
     free(domain_name);
 
     for (socketid = 0; socketid < nr_sockets; socketid++) {
-        if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid,
-                                               &l3_cache_occupancy))
-            printf("%13u KB", l3_cache_occupancy);
+        switch (type) {
+        case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+            if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid,
+                                                   socketid, &data))
+                printf("%13u KB", data);
+            break;
+        default:
+            return;
+        }
     }
 
     printf("\n");
 }
 
-static int psr_cmt_show_cache_occupancy(uint32_t domid)
+static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
 {
     uint32_t i, socketid, nr_sockets, total_rmid;
     uint32_t l3_cache_size;
@@ -7866,19 +7873,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
         printf("%14s %d", "Socket", socketid);
     printf("\n");
 
-    /* Total L3 cache size */
-    printf("%-46s", "Total L3 Cache Size");
-    for (socketid = 0; socketid < nr_sockets; socketid++) {
-        rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size);
-        if (rc < 0) {
-            fprintf(stderr,
-                    "Failed to get system l3 cache size for socket:%d\n",
-                    socketid);
-            return -1;
-        }
-        printf("%13u KB", l3_cache_size);
+    if (type == LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY) {
+            /* Total L3 cache size */
+            printf("%-46s", "Total L3 Cache Size");
+            for (socketid = 0; socketid < nr_sockets; socketid++) {
+                rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid,
+                                                     &l3_cache_size);
+                if (rc < 0) {
+                    fprintf(stderr,
+                            "Failed to get system l3 cache size for socket:%d\n",
+                            socketid);
+                    return -1;
+                }
+                printf("%13u KB", l3_cache_size);
+            }
+            printf("\n");
     }
-    printf("\n");
 
     /* Each domain */
     if (domid != INVALID_DOMID) {
@@ -7887,7 +7897,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
             fprintf(stderr, "Failed to get domain info for %d\n", domid);
             return -1;
         }
-        psr_cmt_print_domain_cache_occupancy(&dominfo, nr_sockets);
+        psr_cmt_print_domain_l3_info(&dominfo, type, nr_sockets);
     }
     else
     {
@@ -7897,7 +7907,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
             return -1;
         }
         for (i = 0; i < nr_domains; i++)
-            psr_cmt_print_domain_cache_occupancy(list + i, nr_sockets);
+            psr_cmt_print_domain_l3_info(list + i, type, nr_sockets);
         libxl_dominfo_list_free(list, nr_domains);
     }
     return 0;
@@ -7956,7 +7966,7 @@ int main_psr_cmt_show(int argc, char **argv)
 
     switch (type) {
     case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
-        ret = psr_cmt_show_cache_occupancy(domid);
+        ret = psr_cmt_show_l3_info(type, domid);
         break;
     default:
         help("psr-cmt-show");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/5] tools: add total/local memory bandwith monitoring
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
                   ` (3 preceding siblings ...)
  2015-01-20  9:45 ` [PATCH v4 4/5] tools: code refactoring for MBM Chao Peng
@ 2015-01-20  9:45 ` Chao Peng
  2015-01-20 13:49 ` [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Andrew Cooper
  5 siblings, 0 replies; 11+ messages in thread
From: Chao Peng @ 2015-01-20  9:45 UTC (permalink / raw)
  To: xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring
are supported: total and local memory bandwidth monitoring. To use it,
CMT should be enabled in hypervisor.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
Changes in v4:
1. Get timestamp from hypervisor and use that for bandwidth calculation.
2. Minor document and coding style fix.
---
 docs/man/xl.pod.1             |  9 ++++
 tools/libxc/include/xenctrl.h |  5 ++-
 tools/libxc/xc_msr_x86.h      |  1 +
 tools/libxc/xc_psr.c          | 34 ++++++++++++---
 tools/libxl/libxl.h           | 10 +++++
 tools/libxl/libxl_psr.c       | 99 +++++++++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_types.idl   |  2 +
 tools/libxl/xl_cmdimpl.c      | 33 ++++++++++++---
 tools/libxl/xl_cmdtable.c     |  4 +-
 9 files changed, 182 insertions(+), 15 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 6b89ba8..50759c0 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just attach the domain id with
 the monitoring service. When the domain doesn't need to be monitored any more,
 detach the domain id from the monitoring service.
 
+Intel Broadwell and later server platforms also offer total/local memory
+bandwidth monitoring. Xen supports per-domain monitoring for these two
+additional monitoring types. Both memory bandwidth monitoring and L3 cache
+occupancy monitoring share the same set of underlying monitoring service. Once
+a domain is attached to the monitoring service, monitoring data can be showed
+for any of these monitoring types.
+
 =over 4
 
 =item B<psr-cmt-attach> [I<domain-id>]
@@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring service from a domain.
 Show monitoring data for a certain domain or all domains. Current supported
 monitor types are:
  - "cache-occupancy": showing the L3 cache occupancy.
+ - "total-mem-bandwidth": showing the total memory bandwidth.
+ - "local-mem-bandwidth": showing the local memory bandwidth.
 
 =back
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 8a0eab6..8f964ad 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, xc_resource_op_t *ops);
 #if defined(__i386__) || defined(__x86_64__)
 enum xc_psr_cmt_type {
     XC_PSR_CMT_L3_OCCUPANCY,
+    XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+    XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
 };
 typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
@@ -2701,7 +2703,8 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
                                  uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-                        uint32_t psr_cmt_type, uint64_t *monitor_data);
+                        uint32_t psr_cmt_type, uint64_t *monitor_data,
+                        uint64_t *tsc);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h
index 7c3e1a3..7f100e7 100644
--- a/tools/libxc/xc_msr_x86.h
+++ b/tools/libxc/xc_msr_x86.h
@@ -20,6 +20,7 @@
 #ifndef XC_MSR_X86_H
 #define XC_MSR_X86_H
 
+#define MSR_IA32_TSC            0x00000010
 #define MSR_IA32_CMT_EVTSEL     0x00000c8d
 #define MSR_IA32_CMT_CTR        0x00000c8e
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index a9881a4..9f8a4f9 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -23,6 +23,8 @@
 #define IA32_CMT_CTR_ERROR_MASK         (0x3ull << 62)
 
 #define EVTID_L3_OCCUPANCY             0x1
+#define EVTID_TOTAL_MEM_BANDWIDTH      0x2
+#define EVTID_LOCAL_MEM_BANDWIDTH      0x3
 
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid)
 {
@@ -156,11 +158,12 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
 }
 
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-                        xc_psr_cmt_type type, uint64_t *monitor_data)
+                        xc_psr_cmt_type type, uint64_t *monitor_data,
+                        uint64_t *tsc)
 {
     xc_resource_op_t op;
-    xc_resource_entry_t entries[2];
-    uint32_t evtid;
+    xc_resource_entry_t entries[3];
+    uint32_t evtid, nr_entries;
     int rc;
 
     switch ( type )
@@ -168,6 +171,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
     case XC_PSR_CMT_L3_OCCUPANCY:
         evtid = EVTID_L3_OCCUPANCY;
         break;
+    case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH:
+        evtid = EVTID_TOTAL_MEM_BANDWIDTH;
+        break;
+    case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH:
+        evtid = EVTID_LOCAL_MEM_BANDWIDTH;
+        break;
     default:
         return -1;
     }
@@ -182,19 +191,34 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
     entries[1].val = 0;
     entries[1].rsvd = 0;
 
+    if ( type == XC_PSR_CMT_L3_OCCUPANCY )
+        nr_entries = 2;
+    else
+    {
+        entries[2].u.cmd = XEN_RESOURCE_OP_MSR_READ;
+        entries[2].idx = MSR_IA32_TSC;
+        entries[2].val = 0;
+        entries[2].rsvd = 0;
+        nr_entries = 3;
+    }
+
     op.cpu = cpu;
-    op.nr_entries = 2;
+    op.nr_entries = nr_entries;
     op.entries = entries;
 
     rc = xc_resource_op(xch, 1, &op);
     if ( rc < 0 )
         return rc;
 
-    if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
+    if ( op.result != nr_entries || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
         return -1;
 
     *monitor_data = entries[1].val;
 
+    if ( type == XC_PSR_CMT_TOTAL_MEM_BANDWIDTH ||
+         type == XC_PSR_CMT_LOCAL_MEM_BANDWIDTH )
+        *tsc = entries[2].val;
+
     return 0;
 }
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 596d2a0..789560f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1462,6 +1462,16 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
                                       uint32_t domid,
                                       uint32_t socketid,
                                       uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t cpu_khz,
+                                          uint32_t *bandwidth);
+int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t cpu_khz,
+                                          uint32_t *bandwidth);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index c88c421..b1a0412 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -18,6 +18,7 @@
 
 
 #define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+#define MBM_SAMPLE_RETRY_MAX 4
 
 static void libxl__psr_cmt_log_err_msg(libxl__gc *gc, int err)
 {
@@ -180,7 +181,8 @@ static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
                                                  uint32_t domid,
                                                  xc_psr_cmt_type type,
                                                  uint32_t socketid,
-                                                 uint64_t *data)
+                                                 uint64_t *data,
+                                                 uint64_t *tsc)
 {
     unsigned int rmid;
     int cpu, rc;
@@ -200,7 +202,7 @@ static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
         goto out;
     }
 
-    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data);
+    rc = xc_psr_cmt_get_data(CTX->xch, rmid, cpu, type, data, tsc);
     if (rc < 0) {
         LOGE(ERROR, "failed to get monitoring data");
         rc = ERROR_FAIL;
@@ -222,7 +224,7 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
 
     rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
                                                XC_PSR_CMT_L3_OCCUPANCY,
-                                               socketid, &data);
+                                               socketid, &data, NULL);
     if (rc < 0)
             goto out;
 
@@ -240,6 +242,101 @@ out:
     return rc;
 }
 
+static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
+                                            uint32_t domid,
+                                            xc_psr_cmt_type type,
+                                            uint32_t socketid,
+                                            uint32_t cpu_khz,
+                                            uint32_t *bandwidth)
+{
+    uint64_t sample1, sample2;
+    uint64_t tsc1, tsc2;
+    uint32_t upscaling_factor;
+    int retry_attempts = 0;
+    int rc;
+
+    while (1) {
+        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
+                                                   &sample1, &tsc1);
+        if (rc < 0) {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        usleep(10000);
+
+        rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
+                                                   &sample2, &tsc2);
+        if (rc < 0) {
+           rc = ERROR_FAIL;
+           goto out;
+        }
+        if (tsc2 <= tsc1) {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        /*
+         * Hardware guarantees at most 1 overflow can happen if the duration
+         * between two samples is less than 1 second.
+         */
+        if (tsc2 - tsc1 < cpu_khz * 1000 && sample2 >= sample1)
+            break;
+
+        if (retry_attempts < MBM_SAMPLE_RETRY_MAX) {
+            retry_attempts++;
+        } else {
+            LOGE(ERROR, "event counter overflowed");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+    }
+
+    rc = xc_psr_cmt_get_l3_upscaling_factor(CTX->xch, &upscaling_factor);
+    if (rc < 0) {
+        LOGE(ERROR, "failed to get L3 upscaling factor");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    *bandwidth = (sample2 - sample1) * 1000 * cpu_khz / (tsc2 - tsc1)
+                 *  upscaling_factor / 1024;
+out:
+    return rc;
+}
+
+int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t cpu_khz,
+                                          uint32_t *bandwidth)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = libxl__psr_cmt_get_mem_bandwidth(gc, domid,
+                                          XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+                                          socketid, cpu_khz, bandwidth);
+    GC_FREE;
+    return rc;
+}
+
+int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
+                                          uint32_t domid,
+                                          uint32_t socketid,
+                                          uint32_t cpu_khz,
+                                          uint32_t *bandwidth)
+{
+    GC_INIT(ctx);
+    int rc;
+
+    rc = libxl__psr_cmt_get_mem_bandwidth(gc, domid,
+                                          XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
+                                          socketid, cpu_khz, bandwidth);
+    GC_FREE;
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1214d2e..46d160e 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -694,4 +694,6 @@ libxl_event = Struct("event",[
 
 libxl_psr_cmt_type = Enumeration("psr_cmt_type", [
     (1, "CACHE_OCCUPANCY"),
+    (2, "TOTAL_MEM_BANDWIDTH"),
+    (3, "LOCAL_MEM_BANDWIDTH"),
     ])
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 1827c63..da7443e 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7809,7 +7809,8 @@ out:
 #ifdef LIBXL_HAVE_PSR_CMT
 static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
                                          libxl_psr_cmt_type type,
-                                         uint32_t nr_sockets)
+                                         uint32_t nr_sockets,
+                                         uint32_t cpu_khz)
 {
     char *domain_name;
     uint32_t socketid;
@@ -7829,6 +7830,18 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
                                                    socketid, &data))
                 printf("%13u KB", data);
             break;
+        case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
+            if (!libxl_psr_cmt_get_total_mem_bandwidth(ctx, dominfo->domid,
+                                                       socketid, cpu_khz,
+                                                       &data))
+                printf("%11u KB/s", data);
+            break;
+        case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
+            if (!libxl_psr_cmt_get_local_mem_bandwidth(ctx, dominfo->domid,
+                                                       socketid, cpu_khz,
+                                                       &data))
+                printf("%11u KB/s", data);
+            break;
         default:
             return;
         }
@@ -7839,8 +7852,8 @@ static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
 
 static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
 {
-    uint32_t i, socketid, nr_sockets, total_rmid;
-    uint32_t l3_cache_size;
+    uint32_t i, socketid, nr_sockets, total_rmid, cpu_khz;
+    uint32_t l3_cache_size, l3_event_mask;
     libxl_physinfo info;
     int rc, nr_domains;
 
@@ -7849,6 +7862,13 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
         return -1;
     }
 
+    rc = libxl_psr_cmt_get_l3_event_mask(ctx, &l3_event_mask);
+    if (rc < 0 || !(l3_event_mask & (1 << (type - 1)))) {
+        fprintf(stderr, "Monitor type '%s' is not supported in the system\n",
+                libxl_psr_cmt_type_to_string(type));
+        return -1;
+    }
+
     libxl_physinfo_init(&info);
     rc = libxl_get_physinfo(ctx, &info);
     if (rc < 0) {
@@ -7857,6 +7877,7 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
         return -1;
     }
     nr_sockets = info.nr_cpus / info.threads_per_core / info.cores_per_socket;
+    cpu_khz = info.cpu_khz;
     libxl_physinfo_dispose(&info);
 
     rc = libxl_psr_cmt_get_total_rmid(ctx, &total_rmid);
@@ -7897,7 +7918,7 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
             fprintf(stderr, "Failed to get domain info for %d\n", domid);
             return -1;
         }
-        psr_cmt_print_domain_l3_info(&dominfo, type, nr_sockets);
+        psr_cmt_print_domain_l3_info(&dominfo, type, nr_sockets, cpu_khz);
     }
     else
     {
@@ -7907,7 +7928,7 @@ static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
             return -1;
         }
         for (i = 0; i < nr_domains; i++)
-            psr_cmt_print_domain_l3_info(list + i, type, nr_sockets);
+            psr_cmt_print_domain_l3_info(list + i, type, nr_sockets, cpu_khz);
         libxl_dominfo_list_free(list, nr_domains);
     }
     return 0;
@@ -7966,6 +7987,8 @@ int main_psr_cmt_show(int argc, char **argv)
 
     switch (type) {
     case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+    case LIBXL_PSR_CMT_TYPE_TOTAL_MEM_BANDWIDTH:
+    case LIBXL_PSR_CMT_TYPE_LOCAL_MEM_BANDWIDTH:
         ret = psr_cmt_show_l3_info(type, domid);
         break;
     default:
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4b30d3d..2d8f272 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -538,7 +538,9 @@ struct cmd_spec cmd_table[] = {
       "Show Cache Monitoring Technology information",
       "<PSR-CMT-Type> <Domain>",
       "Available monitor types:\n"
-      "\"cache_occupancy\":         Show L3 cache occupancy\n",
+      "\"cache_occupancy\":         Show L3 cache occupancy\n"
+      "\"total_mem_bandwidth\":     Show total memory bandwidth\n"
+      "\"local_mem_bandwidth\":     Show local memory bandwidth\n",
     },
 #endif
 };
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-20  9:45 ` [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
@ 2015-01-20 10:02   ` Jan Beulich
  2015-01-20 11:20     ` Chao Peng
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2015-01-20 10:02 UTC (permalink / raw)
  To: Chao Peng
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	xen-devel, will.auld, keir

>>> On 20.01.15 at 10:45, <chao.p.peng@linux.intel.com> wrote:
> Memory bandwidth monitoring requires timestamp returned along with the
> monitoring counter to verify the correctness of the counter value.
> Allow MSR_IA32_TSC to be read and increase RESOURCE_ACCESS_MAX_ENTRIES
> to 3.

Do you really need an MSR access here, i.e. is RDTSC not
sufficient?

> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -61,14 +61,14 @@ long cpu_down_helper(void *data);
>  long core_parking_helper(void *data);
>  uint32_t get_cur_idle_nums(void);
>  
> -#define RESOURCE_ACCESS_MAX_ENTRIES 2
> +#define RESOURCE_ACCESS_MAX_ENTRIES 3

No - the limit on the number of consecutive entries doesn't change
because of your addition of another MSR. Actual batching is to be
done via multicalls, the multi-entry requests here are meant to be
used for successive operations that _must_ be issued without
preemption (e.g. an MSR write needed for a successive read).

>  struct xen_resource_access {
>      unsigned int nr_done;
>      unsigned int nr_entries;
>      xenpf_resource_entry_t *entries;
>  };
>  
> -static bool_t allow_access_msr(unsigned int msr)
> +static bool_t allow_access_msr(unsigned int msr, unsigned int cmd)

Since I'm going to ask you (below) to special case MSR_IA32_TSC on
the read path, I think the write denial should also be handled specially,
and also ideally with a distinguishable (from the -EACCES) error code
(perhaps -EPERM). I.e. this change should be dropped.

> @@ -102,7 +104,7 @@ static void check_resource_access(struct xen_resource_access *ra)
>          case XEN_RESOURCE_OP_MSR_WRITE:
>              if ( entry->idx >> 32 )
>                  ret = -EINVAL;
> -            else if ( !allow_access_msr(entry->idx) )
> +            else if ( !allow_access_msr(entry->idx, entry->u.cmd) )
>                  ret = -EACCES;
>              break;

Unless you have a reason for the read to be done through RDMSR,
I'd really prefer you to special case the TSC and use RDTSC on the
read side (and, as said above, deny the write in resource_access()
rather than in check_resource_access()).

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-20 10:02   ` Jan Beulich
@ 2015-01-20 11:20     ` Chao Peng
  2015-01-20 11:44       ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Peng @ 2015-01-20 11:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	xen-devel, will.auld, keir

On Tue, Jan 20, 2015 at 10:02:32AM +0000, Jan Beulich wrote:
> >>> On 20.01.15 at 10:45, <chao.p.peng@linux.intel.com> wrote:
> > Memory bandwidth monitoring requires timestamp returned along with the
> > monitoring counter to verify the correctness of the counter value.
> > Allow MSR_IA32_TSC to be read and increase RESOURCE_ACCESS_MAX_ENTRIES
> > to 3.
> 
> Do you really need an MSR access here, i.e. is RDTSC not
> sufficient?

RDTSC is enough. The purpose of using MSR is to reuse the existed
XEN_RESOURCE_OP_MSR_READ command. A new command XEN_RESOURCE_OP_TSC_READ
(looks like not so generic) is needed If using RDTSC. If it’s not the
issue, then RDTSC can be used instead.

> 
> > --- a/xen/arch/x86/platform_hypercall.c
> > +++ b/xen/arch/x86/platform_hypercall.c
> > @@ -61,14 +61,14 @@ long cpu_down_helper(void *data);
> >  long core_parking_helper(void *data);
> >  uint32_t get_cur_idle_nums(void);
> >  
> > -#define RESOURCE_ACCESS_MAX_ENTRIES 2
> > +#define RESOURCE_ACCESS_MAX_ENTRIES 3
> 
> No - the limit on the number of consecutive entries doesn't change
> because of your addition of another MSR. Actual batching is to be
> done via multicalls, the multi-entry requests here are meant to be
> used for successive operations that _must_ be issued without
> preemption (e.g. an MSR write needed for a successive read).
> 
You remind me that the requirement here is more than "without
preemption", ideally the exact timestamp the moment that the counter
being read should be returned, which is used by used space to calculate
the time elapsed between two samplings.  So the MSR read and TSC
read should be atomic, which means, 1) preemption should not happen,
2) interrupt should be disabled. Sounds more complex as we have no existed
way to disable interrupt, using back-to-back flag again?

> >  struct xen_resource_access {
> >      unsigned int nr_done;
> >      unsigned int nr_entries;
> >      xenpf_resource_entry_t *entries;
> >  };
> >  
> > -static bool_t allow_access_msr(unsigned int msr)
> > +static bool_t allow_access_msr(unsigned int msr, unsigned int cmd)
> 
> Since I'm going to ask you (below) to special case MSR_IA32_TSC on
> the read path, I think the write denial should also be handled specially,
> and also ideally with a distinguishable (from the -EACCES) error code
> (perhaps -EPERM). I.e. this change should be dropped.
> 
> > @@ -102,7 +104,7 @@ static void check_resource_access(struct xen_resource_access *ra)
> >          case XEN_RESOURCE_OP_MSR_WRITE:
> >              if ( entry->idx >> 32 )
> >                  ret = -EINVAL;
> > -            else if ( !allow_access_msr(entry->idx) )
> > +            else if ( !allow_access_msr(entry->idx, entry->u.cmd) )
> >                  ret = -EACCES;
> >              break;
> 
> Unless you have a reason for the read to be done through RDMSR,
> I'd really prefer you to special case the TSC and use RDTSC on the
> read side (and, as said above, deny the write in resource_access()
> rather than in check_resource_access()).

Sure, I agree to use RDTSC(with a new command XEN_RESOURCE_OP_TSC_READ
added) and the write path will not be supported(e.g. no
XEN_RESOURCE_OP_TSC_WRITE defined).

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-20 11:20     ` Chao Peng
@ 2015-01-20 11:44       ` Jan Beulich
  2015-01-20 14:31         ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2015-01-20 11:44 UTC (permalink / raw)
  To: Chao Peng
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	xen-devel, will.auld, keir

>>> On 20.01.15 at 12:20, <chao.p.peng@linux.intel.com> wrote:
> On Tue, Jan 20, 2015 at 10:02:32AM +0000, Jan Beulich wrote:
>> >>> On 20.01.15 at 10:45, <chao.p.peng@linux.intel.com> wrote:
>> > Memory bandwidth monitoring requires timestamp returned along with the
>> > monitoring counter to verify the correctness of the counter value.
>> > Allow MSR_IA32_TSC to be read and increase RESOURCE_ACCESS_MAX_ENTRIES
>> > to 3.
>> 
>> Do you really need an MSR access here, i.e. is RDTSC not
>> sufficient?
> 
> RDTSC is enough. The purpose of using MSR is to reuse the existed
> XEN_RESOURCE_OP_MSR_READ command. A new command XEN_RESOURCE_OP_TSC_READ
> (looks like not so generic) is needed If using RDTSC. If it’s not the
> issue, then RDTSC can be used instead.

I don't see the need for a sub-op - just filter out MSR_IA32_TSC
in the existing MSR read/write code paths.

>> > --- a/xen/arch/x86/platform_hypercall.c
>> > +++ b/xen/arch/x86/platform_hypercall.c
>> > @@ -61,14 +61,14 @@ long cpu_down_helper(void *data);
>> >  long core_parking_helper(void *data);
>> >  uint32_t get_cur_idle_nums(void);
>> >  
>> > -#define RESOURCE_ACCESS_MAX_ENTRIES 2
>> > +#define RESOURCE_ACCESS_MAX_ENTRIES 3
>> 
>> No - the limit on the number of consecutive entries doesn't change
>> because of your addition of another MSR. Actual batching is to be
>> done via multicalls, the multi-entry requests here are meant to be
>> used for successive operations that _must_ be issued without
>> preemption (e.g. an MSR write needed for a successive read).
>> 
> You remind me that the requirement here is more than "without
> preemption", ideally the exact timestamp the moment that the counter
> being read should be returned, which is used by used space to calculate
> the time elapsed between two samplings.  So the MSR read and TSC
> read should be atomic, which means, 1) preemption should not happen,
> 2) interrupt should be disabled. Sounds more complex as we have no existed
> way to disable interrupt, using back-to-back flag again?

Yeah, in that case pairing the access would be necessary. And
we have a rsvd field in struct xenpf_resource_entry, which you
could use to request IRQ disabling across the pair of accesses.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs
  2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
                   ` (4 preceding siblings ...)
  2015-01-20  9:45 ` [PATCH v4 5/5] tools: add total/local memory bandwith monitoring Chao Peng
@ 2015-01-20 13:49 ` Andrew Cooper
  5 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-01-20 13:49 UTC (permalink / raw)
  To: Chao Peng, xen-devel
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	will.auld, JBeulich, keir

On 20/01/15 09:45, Chao Peng wrote:
> Changes from v3:
> * Get timestamp information from host along with the monitoring counter;
>   This is required for counter overlow detection.
> * Address comments from Wei on the last patch.
>
> Changes from v2:
> * Remove the usage of "static" to cache data in xc;
>   NOTE: Other places that already existed before are not touched due to
>         the needs for API change. Will fix in separate patch if desirable.
> * Coding style;
>
> Changes from v1:
> * Move event type check from xc to xl;
> * Add retry capability for MBM sampling;
> * Fix Coding style/docs;
>
> Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature
> which builds on the CMT infrastructure to allow monitoring of system
> memory bandwidth. Event codes are provided to monitor both "total"
> and "local" bandwidth, meaning bandwidth over QPI and other external
> links can be monitored.
>
> For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its
> dependency on CMT, the software also makes use of most of CMT codes.
> Actually, besides introducing two additional events and some cpuid
> feature bits, there are no extra changes compared to cache occupancy
> monitoring in CMT. Due to this, CMT should be enabled first to use
> this feature.
>
> For interface changes, the patch serial introduces a new command
> "XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature
> capability to user space and modified "resource_op" to support reading
> host timestamp.
> Tool stack side, two additional options introduced for "xl psr-cmt-show":
> total_mem_bandwidth:     Show total memory bandwidth
> local_mem_bandwidth:     Show local memory bandwidth
>
> The usage flow keeps the same with CMT.

Please, somewhere in this series (even if it is a solitary extra patch),
patch the command line documentation to add MBM next to the sentence
about CMT.

~Andrew

>
> Chao Peng (5):
>   x86: allow reading MSR_IA32_TSC with XENPF_resource_op
>   tools: add routine to get CMT L3 event mask
>   tools: correct coding style for psr
>   tools: code refactoring for MBM
>   tools: add total/local memory bandwith monitoring
>
>  docs/man/xl.pod.1                 |   9 +++
>  tools/libxc/include/xenctrl.h     |  14 ++--
>  tools/libxc/xc_msr_x86.h          |   1 +
>  tools/libxc/xc_psr.c              |  59 ++++++++++++---
>  tools/libxl/libxl.h               |  22 +++++-
>  tools/libxl/libxl_psr.c           | 155 ++++++++++++++++++++++++++++++++++----
>  tools/libxl/libxl_types.idl       |   2 +
>  tools/libxl/xl_cmdimpl.c          |  78 +++++++++++++------
>  tools/libxl/xl_cmdtable.c         |   4 +-
>  xen/arch/x86/platform_hypercall.c |   8 +-
>  10 files changed, 295 insertions(+), 57 deletions(-)
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op
  2015-01-20 11:44       ` Jan Beulich
@ 2015-01-20 14:31         ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-01-20 14:31 UTC (permalink / raw)
  To: Jan Beulich, Chao Peng
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, Ian.Jackson,
	xen-devel, will.auld, keir

On 20/01/15 11:44, Jan Beulich wrote:
>>>> On 20.01.15 at 12:20, <chao.p.peng@linux.intel.com> wrote:
>> On Tue, Jan 20, 2015 at 10:02:32AM +0000, Jan Beulich wrote:
>>>>>> On 20.01.15 at 10:45, <chao.p.peng@linux.intel.com> wrote:
>>>> Memory bandwidth monitoring requires timestamp returned along with the
>>>> monitoring counter to verify the correctness of the counter value.
>>>> Allow MSR_IA32_TSC to be read and increase RESOURCE_ACCESS_MAX_ENTRIES
>>>> to 3.
>>> Do you really need an MSR access here, i.e. is RDTSC not
>>> sufficient?
>> RDTSC is enough. The purpose of using MSR is to reuse the existed
>> XEN_RESOURCE_OP_MSR_READ command. A new command XEN_RESOURCE_OP_TSC_READ
>> (looks like not so generic) is needed If using RDTSC. If it’s not the
>> issue, then RDTSC can be used instead.
> I don't see the need for a sub-op - just filter out MSR_IA32_TSC
> in the existing MSR read/write code paths.
>
>>>> --- a/xen/arch/x86/platform_hypercall.c
>>>> +++ b/xen/arch/x86/platform_hypercall.c
>>>> @@ -61,14 +61,14 @@ long cpu_down_helper(void *data);
>>>>  long core_parking_helper(void *data);
>>>>  uint32_t get_cur_idle_nums(void);
>>>>  
>>>> -#define RESOURCE_ACCESS_MAX_ENTRIES 2
>>>> +#define RESOURCE_ACCESS_MAX_ENTRIES 3
>>> No - the limit on the number of consecutive entries doesn't change
>>> because of your addition of another MSR. Actual batching is to be
>>> done via multicalls, the multi-entry requests here are meant to be
>>> used for successive operations that _must_ be issued without
>>> preemption (e.g. an MSR write needed for a successive read).
>>>
>> You remind me that the requirement here is more than "without
>> preemption", ideally the exact timestamp the moment that the counter
>> being read should be returned, which is used by used space to calculate
>> the time elapsed between two samplings.  So the MSR read and TSC
>> read should be atomic, which means, 1) preemption should not happen,
>> 2) interrupt should be disabled. Sounds more complex as we have no existed
>> way to disable interrupt, using back-to-back flag again?
> Yeah, in that case pairing the access would be necessary. And
> we have a rsvd field in struct xenpf_resource_entry, which you
> could use to request IRQ disabling across the pair of accesses.

In terms of the actual time measurement, would it not be better to use
NOW() instead?  That will allow you to get already-scaled ns instead of
having to interpret raw tsc values.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-20 14:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-20  9:45 [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Chao Peng
2015-01-20  9:45 ` [PATCH v4 1/5] x86: allow reading MSR_IA32_TSC with XENPF_resource_op Chao Peng
2015-01-20 10:02   ` Jan Beulich
2015-01-20 11:20     ` Chao Peng
2015-01-20 11:44       ` Jan Beulich
2015-01-20 14:31         ` Andrew Cooper
2015-01-20  9:45 ` [PATCH v4 2/5] tools: add routine to get CMT L3 event mask Chao Peng
2015-01-20  9:45 ` [PATCH v4 3/5] tools: correct coding style for psr Chao Peng
2015-01-20  9:45 ` [PATCH v4 4/5] tools: code refactoring for MBM Chao Peng
2015-01-20  9:45 ` [PATCH v4 5/5] tools: add total/local memory bandwith monitoring Chao Peng
2015-01-20 13:49 ` [PATCH v4 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.