xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature
@ 2014-02-19  6:32 Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Changes from v8:
 - Address comments from Ian Campbell, including:
   * Modify the return handling for xc_sysctl();
   * Add man page items for platform QoS related commands.
   * Fix typo in commit message.

Changes from v7:
 - Address comments from Andrew Cooper, including:
   * Check CQM capability before allocating cpumask memory.
   * Move one function declaration into the correct patch.

Changes from v6:
 - Address comments from Jan Beulich, including:
   * Remove the unnecessary CPUID feature check.
   * Remove the unnecessary socket_cpu_map.
   * Spin_lock related changes, avoid spin_lock_irqsave().
   * Use readonly mapping to pass cqm data between Xen/Userspace,
     to avoid data copying.
   * Optimize RDMSR/WRMSR logic to avoid unnecessary calls.
   * Misc fixes including __read_mostly prefix, return value, etc.

Changes from v5:
 - Address comments from Dario Faggioli, including:
   * Define a new libxl_cqminfo structure to avoid reference of xc
     structure in libxl functions.
   * Use LOGE() instead of the LIBXL__LOG() functions.

Changes from v4:
 - When comparing xl cqm parameter, use strcmp instead of strncmp,
   otherwise, "xl pqos-attach cqmabcd domid" will be considered as
   a valid command line.
 - Address comments from Andrew Cooper, including:
   * Adjust the pqos parameter parsing function.
   * Modify the pqos related documentation.
   * Add a check for opt_cqm_max_rmid in initialization code.
   * Do not IPI CPU that is in same socket with current CPU.
 - Address comments from Dario Faggioli, including:
   * Fix an typo in export symbols.
   * Return correct libxl error code for qos related functions.
   * Abstract the error printing logic into a function.
 - Address comment from Daniel De Graaf, including:
   * Add return value in for pqos related check.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Modify the GPLv2 related file header, remove the address.

Changes from v3:
 - Use structure to better organize CQM related global variables.
 - Address comments from Andrew Cooper, including:
   * Remove the domain creation flag for CQM RMID allocation.
   * Adjust the boot parameter format, use custom_param().
   * Add documentation for the new added boot parameter.
   * Change QoS type flag to be uint64_t.
   * Initialize the per socket cpu bitmap in system boot time.
   * Remove get_cqm_avail() function.
   * Misc of format changes.
 - Address comment from Daniel De Graaf, including:
   * Use avc_current_has_perm() for XEN2__PQOS_OP that belongs to SECCLASS_XEN2.

Changes from v2:
 - Address comments from Andrew Cooper, including:
   * Merging tools stack changes into one patch.
   * Reduce the IPI number to one per socket.
   * Change structures for CQM data exchange between tools and Xen.
   * Misc of format/variable/function name changes.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Simplify the error printing logic.
   * Add xsm check for the new added hypercalls.

Changes from v1:
 - Address comments from Andrew Cooper, including:
   * Change function names, e.g., alloc_cqm_rmid(), system_supports_cqm(), etc.
   * Change some structure element order to save packing cost.
   * Correct some function's return value.
   * Some programming styles change.
   * ...

Future generations of Intel Xeon processor may offer monitoring capability in
each logical processor to measure specific quality-of-service metric,
for example, the Cache QoS Monitoring to get L3 cache occupancy.
Detailed information please refer to Intel SDM chapter 17.14.

Cache QoS Monitoring provides a layer of abstraction between applications and
logical processors through the use of Resource Monitoring IDs (RMIDs).
In Xen design, each guest in the system can be assigned an RMID independently,
while RMID=0 is reserved for monitoring domains that doesn't enable CQM service.
When any of the domain's vcpu is scheduled on a logical processor, the domain's
RMID will be activated by programming the value into one specific MSR, and when
the vcpu is scheduled out, a RMID=0 will be programmed into that MSR.
The Cache QoS Hardware tracks cache utilization of memory accesses according to
the RMIDs and reports monitored data via a counter register. With this solution,
we can get the knowledge how much L3 cache is used by a certain guest.

To attach CQM service to a certain guest, two approaches are provided:
1) Create the guest with "pqos_cqm=1" set in configuration file.
2) Use "xl pqos-attach cqm domid" for a running guest.

To detached CQM service from a guest, users can:
1) Use "xl pqos-detach cqm domid" for a running guest.
2) Also destroying a guest will detach the CQM service.

To get the L3 cache usage, users can use the command of:
$ xl pqos-list cqm

The below data is just an example showing how the CQM related data is exposed to
end user.

[root@localhost]# xl pqos-list cqm
Name               ID  SocketID        L3C_Usage       SocketID        L3C_Usage
Domain-0            0         0         20127744              1         25231360
ExampleHVMDomain    1         0          3211264              1         10551296

RMID count    56        RMID available    53

Dongxiao Xu (6):
  x86: detect and initialize Cache QoS Monitoring feature
  x86: dynamically attach/detach CQM service for a guest
  x86: collect CQM information from all sockets
  x86: enable CQM monitoring for each domain RMID
  xsm: add platform QoS related xsm policies
  tools: enable Cache QoS Monitoring feature for libxl/libxc

 docs/man/xl.pod.1                            |   23 +++
 docs/misc/xen-command-line.markdown          |    7 +
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    5 +-
 tools/libxc/xc_domain.c                      |   36 ++++
 tools/libxc/xenctrl.h                        |   12 ++
 tools/libxl/Makefile                         |    3 +-
 tools/libxl/libxl.h                          |    4 +
 tools/libxl/libxl_pqos.c                     |  132 +++++++++++++
 tools/libxl/libxl_types.idl                  |    7 +
 tools/libxl/xl.h                             |    3 +
 tools/libxl/xl_cmdimpl.c                     |  111 +++++++++++
 tools/libxl/xl_cmdtable.c                    |   15 ++
 xen/arch/x86/Makefile                        |    1 +
 xen/arch/x86/domain.c                        |    8 +
 xen/arch/x86/domctl.c                        |   28 +++
 xen/arch/x86/pqos.c                          |  273 ++++++++++++++++++++++++++
 xen/arch/x86/setup.c                         |    3 +
 xen/arch/x86/sysctl.c                        |   58 ++++++
 xen/include/asm-x86/cpufeature.h             |    1 +
 xen/include/asm-x86/domain.h                 |    2 +
 xen/include/asm-x86/msr-index.h              |    5 +
 xen/include/asm-x86/pqos.h                   |   59 ++++++
 xen/include/public/domctl.h                  |   11 ++
 xen/include/public/sysctl.h                  |   11 ++
 xen/xsm/flask/hooks.c                        |    8 +
 xen/xsm/flask/policy/access_vectors          |   17 +-
 27 files changed, 839 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxl/libxl_pqos.c
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-24 14:12   ` Jan Beulich
  2014-02-19  6:32 ` [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Detect platform QoS feature status and enumerate the resource types,
one of which is to monitor the L3 cache occupancy.

Also introduce a Xen grub command line parameter to control the
QoS feature status.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 docs/misc/xen-command-line.markdown |    7 ++
 xen/arch/x86/Makefile               |    1 +
 xen/arch/x86/pqos.c                 |  156 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c                |    3 +
 xen/include/asm-x86/cpufeature.h    |    1 +
 xen/include/asm-x86/pqos.h          |   43 ++++++++++
 6 files changed, 211 insertions(+)
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/include/asm-x86/pqos.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 15aa404..7751ffe 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -770,6 +770,13 @@ This option can be specified more than once (up to 8 times at present).
 ### ple\_window
 > `= <integer>`
 
+### pqos (Intel)
+> `= List of ( <boolean> | cqm:<boolean> | cqm_max_rmid:<integer> )`
+
+> Default: `pqos=1,cqm:1,cqm_max_rmid:255`
+
+Configure platform QoS services.
+
 ### reboot
 > `= t[riple] | k[bd] | n[o] [, [w]arm | [c]old]`
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..54962e0 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -58,6 +58,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += pqos.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
new file mode 100644
index 0000000..ba0de37
--- /dev/null
+++ b/xen/arch/x86/pqos.c
@@ -0,0 +1,156 @@
+/*
+ * pqos.c: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Jiongxi Li  <jiongxi.li@intel.com>
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <asm/processor.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <asm/pqos.h>
+
+static bool_t __initdata opt_pqos = 1;
+static bool_t __initdata opt_cqm = 1;
+static unsigned int __initdata opt_cqm_max_rmid = 255;
+
+static void __init parse_pqos_param(char *s)
+{
+    char *ss, *val_str;
+    int val;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        val = parse_bool(s);
+        if ( val >= 0 )
+            opt_pqos = val;
+        else
+        {
+            val = !!strncmp(s, "no-", 3);
+            if ( !val )
+                s += 3;
+
+            val_str = strchr(s, ':');
+            if ( val_str )
+                *val_str++ = '\0';
+
+            if ( val_str && !strcmp(s, "cqm") &&
+                 (val = parse_bool(val_str)) >= 0 )
+                opt_cqm = val;
+            else if ( val_str && !strcmp(s, "cqm_max_rmid") )
+                opt_cqm_max_rmid = simple_strtoul(val_str, NULL, 0);
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+custom_param("pqos", parse_pqos_param);
+
+struct pqos_cqm __read_mostly *cqm = NULL;
+
+static void __init init_cqm(void)
+{
+    unsigned int rmid;
+    unsigned int eax, edx;
+    unsigned int cqm_pages;
+    unsigned int i;
+
+    if ( !opt_cqm_max_rmid )
+        return;
+
+    cqm = xzalloc(struct pqos_cqm);
+    if ( !cqm )
+        return;
+
+    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->max_rmid, &edx);
+    if ( !(edx & QOS_MONITOR_EVTID_L3) )
+        goto out;
+
+    cqm->min_rmid = 1;
+    cqm->max_rmid = min(opt_cqm_max_rmid, cqm->max_rmid);
+
+    cqm->rmid_to_dom = xmalloc_array(domid_t, cqm->max_rmid + 1);
+    if ( !cqm->rmid_to_dom )
+        goto out;
+
+    /* Reserve RMID 0 for all domains not being monitored */
+    cqm->rmid_to_dom[0] = DOMID_XEN;
+    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
+        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
+
+    /* Allocate CQM buffer size in initialization stage */
+    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
+                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/
+                PAGE_SIZE + 1;
+    cqm->buffer_size = cqm_pages * PAGE_SIZE;
+
+    cqm->buffer = alloc_xenheap_pages(get_order_from_pages(cqm_pages), 0);
+    if ( !cqm->buffer )
+    {
+        xfree(cqm->rmid_to_dom);
+        goto out;
+    }
+    memset(cqm->buffer, 0, cqm->buffer_size);
+
+    for ( i = 0; i < cqm_pages; i++ )
+        share_xen_page_with_privileged_guests(
+            virt_to_page((void *)((unsigned long)cqm->buffer + i * PAGE_SIZE)),
+            XENSHARE_readonly);
+
+    spin_lock_init(&cqm->cqm_lock);
+
+    cqm->used_rmid = 0;
+
+    printk(XENLOG_INFO "Cache QoS Monitoring Enabled.\n");
+
+    return;
+
+out:
+    xfree(cqm);
+    cqm = NULL;
+}
+
+static void __init init_qos_monitor(void)
+{
+    unsigned int qm_features;
+    unsigned int eax, ebx, ecx;
+
+    if ( !(boot_cpu_has(X86_FEATURE_QOSM)) )
+        return;
+
+    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &qm_features);
+
+    if ( opt_cqm && (qm_features & QOS_MONITOR_TYPE_L3) )
+        init_cqm();
+}
+
+void __init init_platform_qos(void)
+{
+    if ( !opt_pqos )
+        return;
+
+    init_qos_monitor();
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b49256d..639528f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -48,6 +48,7 @@
 #include <asm/setup.h>
 #include <xen/cpu.h>
 #include <asm/nmi.h>
+#include <asm/pqos.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
@@ -1419,6 +1420,8 @@ void __init __start_xen(unsigned long mbi_p)
 
     domain_unpause_by_systemcontroller(dom0);
 
+    init_platform_qos();
+
     reset_stack_and_jump(init_done);
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 1cfaf94..ca59668 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -147,6 +147,7 @@
 #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
 #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
 #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
+#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
 #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
 #define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access Prevention */
 
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
new file mode 100644
index 0000000..0a8065c
--- /dev/null
+++ b/xen/include/asm-x86/pqos.h
@@ -0,0 +1,43 @@
+/*
+ * pqos.h: Platform QoS related service for guest.
+ *
+ * Copyright (c) 2014, Intel Corporation
+ * Author: Jiongxi Li  <jiongxi.li@intel.com>
+ * Author: Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef ASM_PQOS_H
+#define ASM_PQOS_H
+
+#include <public/xen.h>
+#include <xen/spinlock.h>
+
+/* QoS Resource Type Enumeration */
+#define QOS_MONITOR_TYPE_L3            0x2
+
+/* QoS Monitoring Event ID */
+#define QOS_MONITOR_EVTID_L3           0x1
+
+struct pqos_cqm {
+    spinlock_t cqm_lock;
+    uint64_t *buffer;
+    unsigned int min_rmid;
+    unsigned int max_rmid;
+    unsigned int used_rmid;
+    unsigned int upscaling_factor;
+    unsigned int buffer_size;
+    domid_t *rmid_to_dom;
+};
+extern struct pqos_cqm *cqm;
+
+void init_platform_qos(void);
+
+#endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-24 14:15   ` Jan Beulich
  2014-02-19  6:32 ` [PATCH v9 3/6] x86: collect CQM information from all sockets Dongxiao Xu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Add hypervisor side support for dynamically attach and detach CQM
services for a certain guest.

When attach CQM service for a guest, system will allocate an RMID for
it. When detach or guest is shutdown, the RMID will be retrieved back
for future use.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/domain.c        |    3 +++
 xen/arch/x86/domctl.c        |   28 ++++++++++++++++++++
 xen/arch/x86/pqos.c          |   60 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |    2 ++
 xen/include/asm-x86/pqos.h   |   12 +++++++++
 xen/include/public/domctl.h  |   11 ++++++++
 6 files changed, 116 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 16f2b50..2656204 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -60,6 +60,7 @@
 #include <xen/numa.h>
 #include <xen/iommu.h>
 #include <compat/vcpu.h>
+#include <asm/pqos.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 DEFINE_PER_CPU(unsigned long, cr4);
@@ -612,6 +613,8 @@ void arch_domain_destroy(struct domain *d)
 
     free_xenheap_page(d->shared_info);
     cleanup_domain_irq_mapping(d);
+
+    free_cqm_rmid(d);
 }
 
 unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ef6c140..7219011 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -35,6 +35,7 @@
 #include <asm/mem_sharing.h>
 #include <asm/xstate.h>
 #include <asm/debugger.h>
+#include <asm/pqos.h>
 
 static int gdbsx_guest_mem_io(
     domid_t domid, struct xen_domctl_gdbsx_memio *iop)
@@ -1245,6 +1246,33 @@ long arch_do_domctl(
     }
     break;
 
+    case XEN_DOMCTL_attach_pqos:
+    {
+        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
+            ret = -EINVAL;
+        else if ( !system_supports_cqm() )
+            ret = -ENODEV;
+        else
+            ret = alloc_cqm_rmid(d);
+    }
+    break;
+
+    case XEN_DOMCTL_detach_pqos:
+    {
+        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
+            ret = -EINVAL;
+        else if ( !system_supports_cqm() )
+            ret = -ENODEV;
+        else if ( d->arch.pqos_cqm_rmid > 0 )
+        {
+            free_cqm_rmid(d);
+            ret = 0;
+        }
+        else
+            ret = -ENOENT;
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, d, u_domctl);
         break;
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index ba0de37..eb469ac 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -17,6 +17,7 @@
 #include <asm/processor.h>
 #include <xen/init.h>
 #include <xen/mm.h>
+#include <xen/spinlock.h>
 #include <asm/pqos.h>
 
 static bool_t __initdata opt_pqos = 1;
@@ -145,6 +146,65 @@ void __init init_platform_qos(void)
     init_qos_monitor();
 }
 
+int alloc_cqm_rmid(struct domain *d)
+{
+    int rc = 0;
+    unsigned int rmid;
+
+    ASSERT(system_supports_cqm());
+
+    spin_lock(&cqm->cqm_lock);
+
+    if ( d->arch.pqos_cqm_rmid > 0 )
+    {
+        rc = -EEXIST;
+        goto out;
+    }
+
+    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
+    {
+        if ( cqm->rmid_to_dom[rmid] != DOMID_INVALID)
+            continue;
+
+        cqm->rmid_to_dom[rmid] = d->domain_id;
+        break;
+    }
+
+    /* No CQM RMID available, assign RMID=0 by default */
+    if ( rmid > cqm->max_rmid )
+    {
+        rmid = 0;
+        rc = -EUSERS;
+    }
+    else
+        cqm->used_rmid++;
+
+    d->arch.pqos_cqm_rmid = rmid;
+
+out:
+    spin_unlock(&cqm->cqm_lock);
+
+    return rc;
+}
+
+void free_cqm_rmid(struct domain *d)
+{
+    unsigned int rmid;
+
+    spin_lock(&cqm->cqm_lock);
+    rmid = d->arch.pqos_cqm_rmid;
+    /* We do not free system reserved "RMID=0" */
+    if ( rmid == 0 )
+        goto out;
+
+    cqm->rmid_to_dom[rmid] = DOMID_INVALID;
+    d->arch.pqos_cqm_rmid = 0;
+    cqm->used_rmid--;
+
+out:
+    spin_unlock(&cqm->cqm_lock);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index ea72db2..662714d 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -313,6 +313,8 @@ struct arch_domain
     spinlock_t e820_lock;
     struct e820entry *e820;
     unsigned int nr_e820;
+
+    unsigned int pqos_cqm_rmid;       /* CQM RMID assigned to the domain */
 } __cacheline_aligned;
 
 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 0a8065c..f25037d 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -16,6 +16,7 @@
  */
 #ifndef ASM_PQOS_H
 #define ASM_PQOS_H
+#include <xen/sched.h>
 
 #include <public/xen.h>
 #include <xen/spinlock.h>
@@ -38,6 +39,17 @@ struct pqos_cqm {
 };
 extern struct pqos_cqm *cqm;
 
+static inline bool_t system_supports_cqm(void)
+{
+    return !!cqm;
+}
+
+/* IA32_QM_CTR */
+#define IA32_QM_CTR_ERROR_MASK         (0x3ul << 62)
+
 void init_platform_qos(void);
 
+int alloc_cqm_rmid(struct domain *d);
+void free_cqm_rmid(struct domain *d);
+
 #endif
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 91f01fa..f8d9293 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -885,6 +885,14 @@ struct xen_domctl_set_max_evtchn {
 typedef struct xen_domctl_set_max_evtchn xen_domctl_set_max_evtchn_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_max_evtchn_t);
 
+struct xen_domctl_qos_type {
+#define _XEN_DOMCTL_pqos_cqm      0
+#define XEN_DOMCTL_pqos_cqm       (1U<<_XEN_DOMCTL_pqos_cqm)
+    uint64_t flags;
+};
+typedef struct xen_domctl_qos_type xen_domctl_qos_type_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_qos_type_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -954,6 +962,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_setnodeaffinity               68
 #define XEN_DOMCTL_getnodeaffinity               69
 #define XEN_DOMCTL_set_max_evtchn                70
+#define XEN_DOMCTL_attach_pqos                   71
+#define XEN_DOMCTL_detach_pqos                   72
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1014,6 +1024,7 @@ struct xen_domctl {
         struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
+        struct xen_domctl_qos_type          qos_type;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 3/6] x86: collect CQM information from all sockets
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-24 14:23   ` Jan Beulich
  2014-02-19  6:32 ` [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Collect CQM information (L3 cache occupancy) from all sockets.
Upper layer application can parse the data structure to get the
information of guest's L3 cache occupancy on certain sockets.

Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/pqos.c             |   43 +++++++++++++++++++++++++++++
 xen/arch/x86/sysctl.c           |   58 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/msr-index.h |    4 +++
 xen/include/asm-x86/pqos.h      |    3 ++
 xen/include/public/sysctl.h     |   11 ++++++++
 5 files changed, 119 insertions(+)

diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index eb469ac..2cde56e 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -15,6 +15,7 @@
  * more details.
  */
 #include <asm/processor.h>
+#include <asm/msr.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/spinlock.h>
@@ -205,6 +206,48 @@ out:
     spin_unlock(&cqm->cqm_lock);
 }
 
+static void read_cqm_data(void *arg)
+{
+    uint64_t cqm_data;
+    unsigned int rmid;
+    int socket = cpu_to_socket(smp_processor_id());
+    unsigned long i;
+
+    ASSERT(system_supports_cqm());
+
+    if ( socket < 0 )
+        return;
+
+    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
+    {
+        if ( cqm->rmid_to_dom[rmid] == DOMID_INVALID )
+            continue;
+
+        wrmsr(MSR_IA32_QOSEVTSEL, QOS_MONITOR_EVTID_L3, rmid);
+        rdmsrl(MSR_IA32_QMC, cqm_data);
+
+        i = (unsigned long)(cqm->max_rmid + 1) * socket + rmid;
+        if ( !(cqm_data & IA32_QM_CTR_ERROR_MASK) )
+            cqm->buffer[i] = cqm_data * cqm->upscaling_factor;
+    }
+}
+
+void get_cqm_info(const cpumask_t *cpu_cqmdata_map)
+{
+    unsigned int nr_sockets = cpumask_weight(cpu_cqmdata_map) + 1;
+    unsigned int nr_rmids = cqm->max_rmid + 1;
+
+    /* Read CQM data in current CPU */
+    read_cqm_data(NULL);
+    /* Issue IPI to other CPUs to read CQM data */
+    on_selected_cpus(cpu_cqmdata_map, read_cqm_data, NULL, 1);
+
+    /* Copy the rmid_to_dom info to the buffer */
+    memcpy(cqm->buffer + nr_sockets * nr_rmids, cqm->rmid_to_dom,
+           sizeof(domid_t) * (cqm->max_rmid + 1));
+
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 15d4b91..7b0acc9 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -28,6 +28,7 @@
 #include <xen/nodemask.h>
 #include <xen/cpu.h>
 #include <xsm/xsm.h>
+#include <asm/pqos.h>
 
 #define get_xen_guest_handle(val, hnd)  do { val = (hnd).p; } while (0)
 
@@ -66,6 +67,30 @@ void arch_do_physinfo(xen_sysctl_physinfo_t *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm_directio;
 }
 
+/* Select one random CPU for each socket. Current CPU's socket is excluded */
+static void select_socket_cpu(cpumask_t *cpu_bitmap)
+{
+    int i;
+    unsigned int cpu;
+    int socket, socket_curr = cpu_to_socket(smp_processor_id());
+    DECLARE_BITMAP(sockets, NR_CPUS);
+
+    bitmap_zero(sockets, NR_CPUS);
+    if (socket_curr >= 0)
+        set_bit(socket_curr, sockets);
+
+    cpumask_clear(cpu_bitmap);
+    for ( i = 0; i < NR_CPUS; i++ )
+    {
+        socket = cpu_to_socket(i);
+        if ( socket < 0 || test_and_set_bit(socket, sockets) )
+            continue;
+        cpu = cpumask_any(per_cpu(cpu_core_mask, i));
+        if ( cpu < nr_cpu_ids )
+            cpumask_set_cpu(cpu, cpu_bitmap);
+    }
+}
+
 long arch_do_sysctl(
     struct xen_sysctl *sysctl, XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -101,6 +126,39 @@ long arch_do_sysctl(
     }
     break;
 
+    case XEN_SYSCTL_getcqminfo:
+    {
+        cpumask_var_t cpu_cqmdata_map;
+
+        if ( !system_supports_cqm() )
+        {
+            ret = -ENODEV;
+            break;
+        }
+
+        if ( !zalloc_cpumask_var(&cpu_cqmdata_map) )
+        {
+            ret = -ENOMEM;
+            break;
+        }
+
+        memset(cqm->buffer, 0, cqm->buffer_size);
+
+        select_socket_cpu(cpu_cqmdata_map);
+        get_cqm_info(cpu_cqmdata_map);
+
+        sysctl->u.getcqminfo.buffer_mfn = virt_to_mfn(cqm->buffer);
+        sysctl->u.getcqminfo.size = cqm->buffer_size;
+        sysctl->u.getcqminfo.nr_rmids = cqm->max_rmid + 1;
+        sysctl->u.getcqminfo.nr_sockets = cpumask_weight(cpu_cqmdata_map) + 1;
+
+        if ( __copy_to_guest(u_sysctl, sysctl, 1) )
+            ret = -EFAULT;
+
+        free_cpumask_var(cpu_cqmdata_map);
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index fc9fbc6..e3ff10c 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -489,4 +489,8 @@
 /* Geode defined MSRs */
 #define MSR_GEODE_BUSCONT_CONF0		0x00001900
 
+/* Platform QoS register */
+#define MSR_IA32_QOSEVTSEL             0x00000c8d
+#define MSR_IA32_QMC                   0x00000c8e
+
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index f25037d..4372af6 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -17,6 +17,8 @@
 #ifndef ASM_PQOS_H
 #define ASM_PQOS_H
 #include <xen/sched.h>
+#include <xen/cpumask.h>
+#include <public/domctl.h>
 
 #include <public/xen.h>
 #include <xen/spinlock.h>
@@ -51,5 +53,6 @@ void init_platform_qos(void);
 
 int alloc_cqm_rmid(struct domain *d);
 void free_cqm_rmid(struct domain *d);
+void get_cqm_info(const cpumask_t *cpu_cqmdata_map);
 
 #endif
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 8437d31..335b1d9 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -632,6 +632,15 @@ struct xen_sysctl_coverage_op {
 typedef struct xen_sysctl_coverage_op xen_sysctl_coverage_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
 
+struct xen_sysctl_getcqminfo {
+    uint64_aligned_t buffer_mfn;
+    uint32_t size;
+    uint32_t nr_rmids;
+    uint32_t nr_sockets;
+};
+typedef struct xen_sysctl_getcqminfo xen_sysctl_getcqminfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_getcqminfo_t);
+
 
 struct xen_sysctl {
     uint32_t cmd;
@@ -654,6 +663,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
+#define XEN_SYSCTL_getcqminfo                    21
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -675,6 +685,7 @@ struct xen_sysctl {
         struct xen_sysctl_cpupool_op        cpupool_op;
         struct xen_sysctl_scheduler_op      scheduler_op;
         struct xen_sysctl_coverage_op       coverage_op;
+        struct xen_sysctl_getcqminfo        getcqminfo;
         uint8_t                             pad[128];
     } u;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (2 preceding siblings ...)
  2014-02-19  6:32 ` [PATCH v9 3/6] x86: collect CQM information from all sockets Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-24 14:26   ` Jan Beulich
  2014-02-19  6:32 ` [PATCH v9 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

If the CQM service is attached to a domain, its related RMID will be set
to hardware for monitoring when the domain's vcpu is scheduled in. When
the domain's vcpu is scheduled out, RMID 0 (system reserved) will be set
for monitoring.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/domain.c           |    5 +++++
 xen/arch/x86/pqos.c             |   14 ++++++++++++++
 xen/include/asm-x86/msr-index.h |    1 +
 xen/include/asm-x86/pqos.h      |    1 +
 4 files changed, 21 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 2656204..9eeedf0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1372,6 +1372,8 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
+        if ( system_supports_cqm() && cqm->used_rmid )
+            cqm_assoc_rmid(0);
         p->arch.ctxt_switch_from(p);
     }
 
@@ -1396,6 +1398,9 @@ static void __context_switch(void)
         }
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
+
+        if ( system_supports_cqm() && n->domain->arch.pqos_cqm_rmid > 0 )
+            cqm_assoc_rmid(n->domain->arch.pqos_cqm_rmid);
     }
 
     gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
diff --git a/xen/arch/x86/pqos.c b/xen/arch/x86/pqos.c
index 2cde56e..7369e10 100644
--- a/xen/arch/x86/pqos.c
+++ b/xen/arch/x86/pqos.c
@@ -62,6 +62,7 @@ static void __init parse_pqos_param(char *s)
 custom_param("pqos", parse_pqos_param);
 
 struct pqos_cqm __read_mostly *cqm = NULL;
+static uint64_t __read_mostly rmid_mask;
 
 static void __init init_cqm(void)
 {
@@ -135,6 +136,8 @@ static void __init init_qos_monitor(void)
 
     cpuid_count(0xf, 0, &eax, &ebx, &ecx, &qm_features);
 
+    rmid_mask = ~(~0ull << get_count_order(ebx));
+
     if ( opt_cqm && (qm_features & QOS_MONITOR_TYPE_L3) )
         init_cqm();
 }
@@ -248,6 +251,17 @@ void get_cqm_info(const cpumask_t *cpu_cqmdata_map)
 
 }
 
+void cqm_assoc_rmid(unsigned int rmid)
+{
+    uint64_t val;
+    uint64_t new_val;
+
+    rdmsrl(MSR_IA32_PQR_ASSOC, val);
+    new_val = (val & ~rmid_mask) | (rmid & rmid_mask);
+    if ( val != new_val )
+        wrmsrl(MSR_IA32_PQR_ASSOC, new_val);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index e3ff10c..13800e6 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -492,5 +492,6 @@
 /* Platform QoS register */
 #define MSR_IA32_QOSEVTSEL             0x00000c8d
 #define MSR_IA32_QMC                   0x00000c8e
+#define MSR_IA32_PQR_ASSOC             0x00000c8f
 
 #endif /* __ASM_MSR_INDEX_H */
diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
index 4372af6..87820d5 100644
--- a/xen/include/asm-x86/pqos.h
+++ b/xen/include/asm-x86/pqos.h
@@ -54,5 +54,6 @@ void init_platform_qos(void);
 int alloc_cqm_rmid(struct domain *d);
 void free_cqm_rmid(struct domain *d);
 void get_cqm_info(const cpumask_t *cpu_cqmdata_map);
+void cqm_assoc_rmid(unsigned int rmid);
 
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 5/6] xsm: add platform QoS related xsm policies
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (3 preceding siblings ...)
  2014-02-19  6:32 ` [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-19  6:32 ` [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
  5 siblings, 0 replies; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Add xsm policies for attach/detach pqos services and get CQM info
hypercalls.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    5 ++++-
 xen/xsm/flask/hooks.c                        |    8 ++++++++
 xen/xsm/flask/policy/access_vectors          |   17 ++++++++++++++---
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index dedc035..1f683af 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -49,7 +49,7 @@ define(`create_domain_common', `
 			getdomaininfo hypercall setvcpucontext setextvcpucontext
 			getscheduler getvcpuinfo getvcpuextstate getaddrsize
 			getaffinity setaffinity };
-	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim  set_max_evtchn };
+	allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn pqos_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op };
diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bb59fe8..115fcfe 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
 	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
 	tmem_control getscheduler setscheduler
 };
+allow dom0_t xen_t:xen2 {
+	pqos_op
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -76,7 +79,7 @@ allow dom0_t dom0_t:domain {
 	getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-	set_cpuid gettsc settsc setscheduler set_max_evtchn
+	set_cpuid gettsc settsc setscheduler set_max_evtchn pqos_op
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 7cdef04..6ee7771 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -730,6 +730,10 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_set_max_evtchn:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_MAX_EVTCHN);
 
+    case XEN_DOMCTL_attach_pqos:
+    case XEN_DOMCTL_detach_pqos:
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PQOS_OP);
+
     default:
         printk("flask_domctl: Unknown op %d\n", cmd);
         return -EPERM;
@@ -785,6 +789,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_numainfo:
         return domain_has_xen(current->domain, XEN__PHYSINFO);
 
+    case XEN_SYSCTL_getcqminfo:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__PQOS_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 1fbe241..91af8b2 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -3,9 +3,9 @@
 #
 # class class_name { permission_name ... }
 
-# Class xen consists of dom0-only operations dealing with the hypervisor itself.
-# Unless otherwise specified, the source is the domain executing the hypercall,
-# and the target is the xen initial sid (type xen_t).
+# Class xen and xen2 consists of dom0-only operations dealing with the
+# hypervisor itself. Unless otherwise specified, the source is the domain
+# executing the hypercall, and the target is the xen initial sid (type xen_t).
 class xen
 {
 # XENPF_settime
@@ -75,6 +75,14 @@ class xen
     setscheduler
 }
 
+# This is a continuation of class xen, since only 32 permissions can be
+# defined per class
+class xen2
+{
+# XEN_SYSCTL_getcqminfo
+    pqos_op
+}
+
 # Classes domain and domain2 consist of operations that a domain performs on
 # another domain or on itself.  Unless otherwise specified, the source is the
 # domain executing the hypercall, and the target is the domain being operated on
@@ -196,6 +204,9 @@ class domain2
     setclaim
 # XEN_DOMCTL_set_max_evtchn
     set_max_evtchn
+# XEN_DOMCTL_attach_pqos
+# XEN_DOMCTL_detach_pqos
+    pqos_op
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
                   ` (4 preceding siblings ...)
  2014-02-19  6:32 ` [PATCH v9 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
@ 2014-02-19  6:32 ` Dongxiao Xu
  2014-02-19  9:39   ` Ian Campbell
  5 siblings, 1 reply; 28+ messages in thread
From: Dongxiao Xu @ 2014-02-19  6:32 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	Ian.Jackson, JBeulich, dgdegra

Introduced two new xl commands to attach/detach CQM service for a guest
$ xl pqos-attach cqm domid
$ xl pqos-detach cqm domid

Introduce one new xl command to retrieve guest CQM information
$ xl pqos-list cqm

Signed-off-by: Jiongxi Li <jiongxi.li@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 docs/man/xl.pod.1           |   23 ++++++++
 tools/libxc/xc_domain.c     |   36 ++++++++++++
 tools/libxc/xenctrl.h       |   12 ++++
 tools/libxl/Makefile        |    3 +-
 tools/libxl/libxl.h         |    4 ++
 tools/libxl/libxl_pqos.c    |  132 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |    7 +++
 tools/libxl/xl.h            |    3 +
 tools/libxl/xl_cmdimpl.c    |  111 ++++++++++++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c   |   15 +++++
 10 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_pqos.c

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index e7b9de2..c1b1acd 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1334,6 +1334,29 @@ Load FLASK policy from the given policy file. The initial policy is provided to
 the hypervisor as a multiboot module; this command allows runtime updates to the
 policy. Loading new security policy will reset runtime changes to device labels.
 
+=head1 PLATFORM QOS
+
+New Intel processor may offer monitoring capability in each logical processor to
+measure specific quality-of-service metric, for example, Cache QoS Monitoring to
+get L3 cache occupancy.
+
+=over 4 
+
+=item B<pqos-attach> [I<qos-type>] [I<domain-id>]
+
+Attach certain platform QoS service for a domain.
+Current supported I<qos-type> is: "cqm".
+
+=item B<pqos-detach> [I<qos-type>] [I<domain-id>]
+
+Detach certain platform QoS service from a domain.
+Current supported I<qos-type> is: "cqm".
+
+=item B<pqos-list> [I<qos-type>]
+
+List platform QoS information for QoS attached domains.
+Current supported I<qos-type> is: "cqm".
+
 =back
 
 =head1 TO BE DOCUMENTED
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index c2fdd74..67b41e7 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1776,6 +1776,42 @@ int xc_domain_set_max_evtchn(xc_interface *xch, uint32_t domid,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_pqos_attach(xc_interface *xch, uint32_t domid, uint64_t flags)
+{
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_attach_pqos;
+    domctl.domain = (domid_t)domid;
+    domctl.u.qos_type.flags = flags;
+    return do_domctl(xch, &domctl);
+}
+
+int xc_domain_pqos_detach(xc_interface *xch, uint32_t domid, uint64_t flags)
+{
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_detach_pqos;
+    domctl.domain = (domid_t)domid;
+    domctl.u.qos_type.flags = flags;
+    return do_domctl(xch, &domctl);
+}
+
+int xc_domain_getcqminfo(xc_interface *xch, xc_cqminfo_t *info)
+{
+    int ret;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_getcqminfo;
+    ret = xc_sysctl(xch, &sysctl);
+    if ( ret >= 0 )
+    {
+        info->buffer_mfn = sysctl.u.getcqminfo.buffer_mfn;
+        info->size = sysctl.u.getcqminfo.size;
+        info->nr_rmids = sysctl.u.getcqminfo.nr_rmids;
+        info->nr_sockets = sysctl.u.getcqminfo.nr_sockets;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 13f816b..f4eb198 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2427,4 +2427,16 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
  */
 int xc_kexec_unload(xc_interface *xch, int type);
 
+struct xc_cqminfo
+{
+    uint64_aligned_t buffer_mfn;
+    uint32_t size;
+    uint32_t nr_rmids;
+    uint32_t nr_sockets;
+};
+typedef struct xc_cqminfo xc_cqminfo_t;
+
+int xc_domain_pqos_attach(xc_interface *xch, uint32_t domid, uint64_t flags);
+int xc_domain_pqos_detach(xc_interface *xch, uint32_t domid, uint64_t flags);
+int xc_domain_getcqminfo(xc_interface *xch, xc_cqminfo_t *info);
 #endif /* XENCTRL_H */
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index d8495bb..8beb7f8 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -76,7 +76,8 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
 			libxl_json.o libxl_aoutils.o libxl_numa.o \
 			libxl_save_callout.o _libxl_save_msgs_callout.o \
-			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
+			libxl_qmp.o libxl_event.o libxl_fork.o libxl_pqos.o \
+			$(LIBXL_OBJS-y)
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
 $(LIBXL_OBJS): CFLAGS += $(CFLAGS_LIBXL) -include $(XEN_ROOT)/tools/config.h
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 12d6c31..f3d2202 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1105,6 +1105,10 @@ int libxl_flask_getenforce(libxl_ctx *ctx);
 int libxl_flask_setenforce(libxl_ctx *ctx, int mode);
 int libxl_flask_loadpolicy(libxl_ctx *ctx, void *policy, uint32_t size);
 
+int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);
+int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);
+void libxl_map_cqm_buffer(libxl_ctx *ctx, libxl_cqminfo *xlinfo);
+
 /* misc */
 
 /* Each of these sets or clears the flag according to whether the
diff --git a/tools/libxl/libxl_pqos.c b/tools/libxl/libxl_pqos.c
new file mode 100644
index 0000000..664a891
--- /dev/null
+++ b/tools/libxl/libxl_pqos.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (C) 2014      Intel Corporation
+ * Author Jiongxi Li <jiongxi.li@intel.com>
+ * Author Dongxiao Xu <dongxiao.xu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+
+static const char * const msg[] = {
+    [EINVAL] = "invalid QoS resource type! Supported types: \"cqm\"",
+    [ENODEV] = "CQM is not supported in this system.",
+    [EEXIST] = "CQM is already attached to this domain.",
+    [ENOENT] = "CQM is not attached to this domain.",
+    [EUSERS] = "there is no free CQM RMID available.",
+    [ESRCH]  = "is this Domain ID valid?",
+};
+
+static void libxl_pqos_err_msg(libxl_ctx *ctx, int err)
+{
+    GC_INIT(ctx);
+
+    switch (err) {
+    case EINVAL:
+    case ENODEV:
+    case EEXIST:
+    case EUSERS:
+    case ESRCH:
+    case ENOENT:
+        LOGE(ERROR, "%s", msg[err]);
+        break;
+    default:
+        LOGE(ERROR, "errno: %d", err);
+    }
+
+    GC_FREE;
+}
+
+static int libxl_pqos_type2flags(const char * qos_type, uint64_t *flags)
+{
+    int rc = 0;
+
+    if (!strcmp(qos_type, "cqm"))
+        *flags |= XEN_DOMCTL_pqos_cqm;
+    else
+        rc = -1;
+
+    return rc;
+}
+
+int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, const char * qos_type)
+{
+    int rc;
+    uint64_t flags = 0;
+
+    rc = libxl_pqos_type2flags(qos_type, &flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, EINVAL);
+        return ERROR_FAIL;
+    }
+
+    rc = xc_domain_pqos_attach(ctx->xch, domid, flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, const char * qos_type)
+{
+    int rc;
+    uint64_t flags = 0;
+
+    rc = libxl_pqos_type2flags(qos_type, &flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, EINVAL);
+        return ERROR_FAIL;
+    }
+
+    rc = xc_domain_pqos_detach(ctx->xch, domid, flags);
+    if (rc < 0) {
+        libxl_pqos_err_msg(ctx, errno);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
+void libxl_map_cqm_buffer(libxl_ctx *ctx, libxl_cqminfo *xlinfo)
+{
+    int ret;
+    xc_cqminfo_t xcinfo;
+    GC_INIT(ctx);
+
+    ret = xc_domain_getcqminfo(ctx->xch, &xcinfo);
+    if (ret < 0) {
+        LOGE(ERROR, "getting domain cqm info");
+        return;
+    }
+
+    xlinfo->buffer_virt = (uint64_t)xc_map_foreign_range(ctx->xch, DOMID_XEN,
+                              xcinfo.size, PROT_READ, xcinfo.buffer_mfn);
+    if (xlinfo->buffer_virt == 0) {
+        LOGE(ERROR, "Failed to map cqm buffers");
+        return;
+    }
+    xlinfo->size = xcinfo.size;
+    xlinfo->nr_rmids = xcinfo.nr_rmids;
+    xlinfo->nr_sockets = xcinfo.nr_sockets;
+
+    GC_FREE;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..43c0f48 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -596,3 +596,10 @@ libxl_event = Struct("event",[
                                  ])),
            ("domain_create_console_available", Struct(None, [])),
            ]))])
+
+libxl_cqminfo = Struct("cqminfo", [
+    ("buffer_virt",    uint64),
+    ("size",           uint32),
+    ("nr_rmids",       uint32),
+    ("nr_sockets",     uint32),
+    ])
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index c876a33..4362b96 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -106,6 +106,9 @@ int main_setenforce(int argc, char **argv);
 int main_loadpolicy(int argc, char **argv);
 int main_remus(int argc, char **argv);
 int main_devd(int argc, char **argv);
+int main_pqosattach(int argc, char **argv);
+int main_pqosdetach(int argc, char **argv);
+int main_pqoslist(int argc, char **argv);
 
 void help(const char *command);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index d93e01b..2e0b822 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7364,6 +7364,117 @@ out:
     return ret;
 }
 
+int main_pqosattach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, rc;
+    const char *qos_type = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-attach", 2) {
+        /* No options */
+    }
+
+    qos_type = argv[optind];
+    domid = find_domain(argv[optind + 1]);
+
+    rc = libxl_pqos_attach(ctx, domid, qos_type);
+
+    return rc;
+}
+
+int main_pqosdetach(int argc, char **argv)
+{
+    uint32_t domid;
+    int opt, rc;
+    const char *qos_type = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-detach", 2) {
+        /* No options */
+    }
+
+    qos_type = argv[optind];
+    domid = find_domain(argv[optind + 1]);
+
+    rc = libxl_pqos_detach(ctx, domid, qos_type);
+
+    return rc;
+}
+
+static void print_cqm_info(const libxl_cqminfo *info)
+{
+    unsigned int i, j, k;
+    char *domname;
+    int print_header;
+    int cqm_domains = 0;
+    uint16_t *rmid_to_dom;
+    uint64_t *l3c_data;
+    uint32_t first_domain = 0;
+    unsigned int num_domains = 1024;
+
+    if (info->nr_rmids == 0) {
+        printf("System doesn't support CQM.\n");
+        return;
+    }
+
+    print_header = 1;
+    l3c_data = (uint64_t *)(info->buffer_virt);
+    rmid_to_dom = (uint16_t *)(info->buffer_virt +
+                  info->nr_sockets * info->nr_rmids * sizeof(uint64_t));
+    for (i = first_domain; i < (first_domain + num_domains); i++) {
+        for (j = 0; j < info->nr_rmids; j++) {
+            if (rmid_to_dom[j] != i)
+                continue;
+
+            if (print_header) {
+                printf("Name                                        ID");
+                for (k = 0; k < info->nr_sockets; k++)
+                    printf("\tSocketID\tL3C_Usage");
+                print_header = 0;
+            }
+
+            domname = libxl_domid_to_name(ctx, i);
+            printf("\n%-40s %5d", domname, i);
+            free(domname);
+            cqm_domains++;
+
+            for (k = 0; k < info->nr_sockets; k++)
+                printf("%10u %16lu     ",
+                        k, l3c_data[info->nr_rmids * k + j]);
+        }
+    }
+    if (!cqm_domains)
+        printf("No RMID is assigned to domains.\n");
+    else
+        printf("\n");
+
+    printf("\nRMID count %5d\tRMID available %5d\n",
+           info->nr_rmids, info->nr_rmids - cqm_domains - 1);
+}
+
+int main_pqoslist(int argc, char **argv)
+{
+    int opt;
+    const char *qos_type = NULL;
+
+    SWITCH_FOREACH_OPT(opt, "", NULL, "pqos-list", 1) {
+        /* No options */
+    }
+
+    qos_type = argv[optind];
+
+    if (!strcmp(qos_type, "cqm")) {
+        libxl_cqminfo info;
+        libxl_map_cqm_buffer(ctx, &info);
+        print_cqm_info(&info);
+    } else {
+        fprintf(stderr, "QoS resource type supported is: cqm.\n");
+        help("pqos-list");
+        return 2;
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index ebe0220..d4af4a9 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -494,6 +494,21 @@ struct cmd_spec cmd_table[] = {
       "[options]",
       "-F                      Run in the foreground",
     },
+    { "pqos-attach",
+      &main_pqosattach, 0, 1,
+      "Allocate and map qos resource",
+      "<Resource> <Domain>",
+    },
+    { "pqos-detach",
+      &main_pqosdetach, 0, 1,
+      "Reliquish qos resource",
+      "<Resource> <Domain>",
+    },
+    { "pqos-list",
+      &main_pqoslist, 0, 0,
+      "List qos information for all domains",
+      "<Resource>",
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-02-19  6:32 ` [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
@ 2014-02-19  9:39   ` Ian Campbell
  2014-03-03 13:46     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Campbell @ 2014-02-19  9:39 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, stefano.stabellini, andrew.cooper3, Ian.Jackson, xen-devel,
	JBeulich, dgdegra

On Wed, 2014-02-19 at 14:32 +0800, Dongxiao Xu wrote:
> +=item B<pqos-attach> [I<qos-type>] [I<domain-id>]
> +
> +Attach certain platform QoS service for a domain.
> +Current supported I<qos-type> is: "cqm".

> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 12d6c31..f3d2202 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1105,6 +1105,10 @@ int libxl_flask_getenforce(libxl_ctx *ctx);
>  int libxl_flask_setenforce(libxl_ctx *ctx, int mode);
>  int libxl_flask_loadpolicy(libxl_ctx *ctx, void *policy, uint32_t size);
>  
> +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);
> +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);

I have a feeling that qos_type should actually be an enum in the IDL.
The xl functions can probably use the autogenerate
libxl_BLAH_from_string() functions to help with parsing.

What other qos types are you envisaging? Is it valid to enable or
disable multiple such things independently?

> +void libxl_map_cqm_buffer(libxl_ctx *ctx, libxl_cqminfo *xlinfo);

So each qos type is going to come with its own map function?

I don't see the LIBXL_HAVE #define which we discussed last time anywhere
here.

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 649ce50..43c0f48 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -596,3 +596,10 @@ libxl_event = Struct("event",[
>                                   ])),
>             ("domain_create_console_available", Struct(None, [])),
>             ]))])
> +
> +libxl_cqminfo = Struct("cqminfo", [
> +    ("buffer_virt",    uint64),

An opaque void * masquerading as an integer is not a suitable interface.

This should be a (pointer to a) struct of the appropriate type, or an
Array of such types etc (or possibly several such arrays depending on
what you are returning).

I haven't looked in detail into what is actually in this buffer, but
please try and have libxl bake it into a more consumable form -- e.g. an
array of per-domain properties or something rather than a raw list.

> +    ("size",           uint32),
> +    ("nr_rmids",       uint32),
> +    ("nr_sockets",     uint32),
> +    ])
> [...][
> +static void print_cqm_info(const libxl_cqminfo *info)
> +{
> +    unsigned int i, j, k;
> +    char *domname;
> +    int print_header;
> +    int cqm_domains = 0;
> +    uint16_t *rmid_to_dom;
> +    uint64_t *l3c_data;
> +    uint32_t first_domain = 0;
> +    unsigned int num_domains = 1024;
> +
> +    if (info->nr_rmids == 0) {
> +        printf("System doesn't support CQM.\n");
> +        return;
> +    }
> +
> +    print_header = 1;
> +    l3c_data = (uint64_t *)(info->buffer_virt);
> +    rmid_to_dom = (uint16_t *)(info->buffer_virt +
> +                  info->nr_sockets * info->nr_rmids * sizeof(uint64_t));
> +    for (i = first_domain; i < (first_domain + num_domains); i++) {
> +        for (j = 0; j < info->nr_rmids; j++) {
> +            if (rmid_to_dom[j] != i)
> +                continue;
> +
> +            if (print_header) {
> +                printf("Name                                        ID");
> +                for (k = 0; k < info->nr_sockets; k++)
> +                    printf("\tSocketID\tL3C_Usage");
> +                print_header = 0;
> +            }
> +
> +            domname = libxl_domid_to_name(ctx, i);
> +            printf("\n%-40s %5d", domname, i);
> +            free(domname);
> +            cqm_domains++;
> +
> +            for (k = 0; k < info->nr_sockets; k++)
> +                printf("%10u %16lu     ",
> +                        k, l3c_data[info->nr_rmids * k + j]);
> +        }

This should be transformed into a sensible interface within libxl so
that it can be consumed in a straightforward manner by the users of
libxl, rather than asking them all to reimplement this.

Is the buffer format considered a frozen ABI?

> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index ebe0220..d4af4a9 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -494,6 +494,21 @@ struct cmd_spec cmd_table[] = {
>        "[options]",
>        "-F                      Run in the foreground",
>      },
> +    { "pqos-attach",
> +      &main_pqosattach, 0, 1,
> +      "Allocate and map qos resource",
> +      "<Resource> <Domain>",
> +    },
> +    { "pqos-detach",
> +      &main_pqosdetach, 0, 1,
> +      "Reliquish qos resource",

"Relinquish"

and perhaps "resources" (in both cases)

> +      "<Resource> <Domain>",
> +    },
> +    { "pqos-list",
> +      &main_pqoslist, 0, 0,
> +      "List qos information for all domains",
> +      "<Resource>",
> +    },
>  };
>  
>  int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-02-19  6:32 ` [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
@ 2014-02-24 14:12   ` Jan Beulich
  2014-03-03 13:21     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-02-24 14:12 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> +struct pqos_cqm __read_mostly *cqm = NULL;

Misplaced __read_mostly (belongs after the *).

> +static void __init init_cqm(void)
> +{
> +    unsigned int rmid;
> +    unsigned int eax, edx;
> +    unsigned int cqm_pages;
> +    unsigned int i;
> +
> +    if ( !opt_cqm_max_rmid )
> +        return;
> +
> +    cqm = xzalloc(struct pqos_cqm);
> +    if ( !cqm )
> +        return;
> +
> +    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->max_rmid, &edx);
> +    if ( !(edx & QOS_MONITOR_EVTID_L3) )
> +        goto out;
> +
> +    cqm->min_rmid = 1;
> +    cqm->max_rmid = min(opt_cqm_max_rmid, cqm->max_rmid);
> +
> +    cqm->rmid_to_dom = xmalloc_array(domid_t, cqm->max_rmid + 1);
> +    if ( !cqm->rmid_to_dom )
> +        goto out;
> +
> +    /* Reserve RMID 0 for all domains not being monitored */
> +    cqm->rmid_to_dom[0] = DOMID_XEN;
> +    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
> +        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
> +
> +    /* Allocate CQM buffer size in initialization stage */
> +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
> +                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/

Does this really need to be NR_CPUS (rather than nr_cpu_ids)?

> +                PAGE_SIZE + 1;
> +    cqm->buffer_size = cqm_pages * PAGE_SIZE;
> +
> +    cqm->buffer = alloc_xenheap_pages(get_order_from_pages(cqm_pages), 0);

And does the allocation really need to be physically contiguous?
If so - did you calculate how much more memory you allocate
(due to the rounding up to the next power of 2), to decide
whether it's worthwhile freeing the unused portion?

> +    if ( !cqm->buffer )
> +    {
> +        xfree(cqm->rmid_to_dom);
> +        goto out;
> +    }
> +    memset(cqm->buffer, 0, cqm->buffer_size);
> +
> +    for ( i = 0; i < cqm_pages; i++ )
> +        share_xen_page_with_privileged_guests(
> +            virt_to_page((void *)((unsigned long)cqm->buffer + i * PAGE_SIZE)),

virt_to_page((void *)cqm->buffer + i * PAGE_SIZE)

> +static void __init init_qos_monitor(void)
> +{
> +    unsigned int qm_features;
> +    unsigned int eax, ebx, ecx;
> +
> +    if ( !(boot_cpu_has(X86_FEATURE_QOSM)) )

Pointless pair of parentheses.

> +        return;
> +
> +    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &qm_features);
> +
> +    if ( opt_cqm && (qm_features & QOS_MONITOR_TYPE_L3) )
> +        init_cqm();
> +}
> +
> +void __init init_platform_qos(void)
> +{
> +    if ( !opt_pqos )
> +        return;
> +
> +    init_qos_monitor();
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index b49256d..639528f 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -48,6 +48,7 @@
>  #include <asm/setup.h>
>  #include <xen/cpu.h>
>  #include <asm/nmi.h>
> +#include <asm/pqos.h>
>  
>  /* opt_nosmp: If true, secondary processors are ignored. */
>  static bool_t __initdata opt_nosmp;
> @@ -1419,6 +1420,8 @@ void __init __start_xen(unsigned long mbi_p)
>  
>      domain_unpause_by_systemcontroller(dom0);
>  
> +    init_platform_qos();
> +
>      reset_stack_and_jump(init_done);
>  }
>  
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index 1cfaf94..ca59668 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -147,6 +147,7 @@
>  #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
>  #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context ID */
>  #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional Memory */
> +#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring capability */
>  #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as zero */
>  #define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access Prevention */
>  
> diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
> new file mode 100644
> index 0000000..0a8065c
> --- /dev/null
> +++ b/xen/include/asm-x86/pqos.h
> @@ -0,0 +1,43 @@
> +/*
> + * pqos.h: Platform QoS related service for guest.
> + *
> + * Copyright (c) 2014, Intel Corporation
> + * Author: Jiongxi Li  <jiongxi.li@intel.com>
> + * Author: Dongxiao Xu <dongxiao.xu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License 
> for
> + * more details.
> + */
> +#ifndef ASM_PQOS_H
> +#define ASM_PQOS_H
> +
> +#include <public/xen.h>
> +#include <xen/spinlock.h>
> +
> +/* QoS Resource Type Enumeration */
> +#define QOS_MONITOR_TYPE_L3            0x2
> +
> +/* QoS Monitoring Event ID */
> +#define QOS_MONITOR_EVTID_L3           0x1
> +
> +struct pqos_cqm {
> +    spinlock_t cqm_lock;
> +    uint64_t *buffer;
> +    unsigned int min_rmid;
> +    unsigned int max_rmid;
> +    unsigned int used_rmid;
> +    unsigned int upscaling_factor;
> +    unsigned int buffer_size;
> +    domid_t *rmid_to_dom;
> +};
> +extern struct pqos_cqm *cqm;
> +
> +void init_platform_qos(void);
> +
> +#endif
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest
  2014-02-19  6:32 ` [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
@ 2014-02-24 14:15   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-02-24 14:15 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> @@ -1245,6 +1246,33 @@ long arch_do_domctl(
>      }
>      break;
>  
> +    case XEN_DOMCTL_attach_pqos:
> +    {
> +        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
> +            ret = -EINVAL;
> +        else if ( !system_supports_cqm() )
> +            ret = -ENODEV;
> +        else
> +            ret = alloc_cqm_rmid(d);
> +    }

Pointless curly braces.

> +    case XEN_DOMCTL_detach_pqos:
> +    {
> +        if ( !(domctl->u.qos_type.flags & XEN_DOMCTL_pqos_cqm) )
> +            ret = -EINVAL;
> +        else if ( !system_supports_cqm() )
> +            ret = -ENODEV;
> +        else if ( d->arch.pqos_cqm_rmid > 0 )
> +        {
> +            free_cqm_rmid(d);
> +            ret = 0;
> +        }
> +        else
> +            ret = -ENOENT;
> +    }

Again.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 3/6] x86: collect CQM information from all sockets
  2014-02-19  6:32 ` [PATCH v9 3/6] x86: collect CQM information from all sockets Dongxiao Xu
@ 2014-02-24 14:23   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-02-24 14:23 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> +static void read_cqm_data(void *arg)
> +{
> +    uint64_t cqm_data;
> +    unsigned int rmid;
> +    int socket = cpu_to_socket(smp_processor_id());
> +    unsigned long i;
> +
> +    ASSERT(system_supports_cqm());
> +
> +    if ( socket < 0 )
> +        return;
> +
> +    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
> +    {
> +        if ( cqm->rmid_to_dom[rmid] == DOMID_INVALID )
> +            continue;
> +
> +        wrmsr(MSR_IA32_QOSEVTSEL, QOS_MONITOR_EVTID_L3, rmid);
> +        rdmsrl(MSR_IA32_QMC, cqm_data);
> +
> +        i = (unsigned long)(cqm->max_rmid + 1) * socket + rmid;
> +        if ( !(cqm_data & IA32_QM_CTR_ERROR_MASK) )
> +            cqm->buffer[i] = cqm_data * cqm->upscaling_factor;

So my earlier comment regarding the NR_CPUS use in the allocation
of this buffer becomes even more relevant with the fact that you're
indexing by socket here, not by CPU - in that case, even nr_cpu_ids
is likely to be a gross overestimation.

> +static void select_socket_cpu(cpumask_t *cpu_bitmap)
> +{
> +    int i;
> +    unsigned int cpu;
> +    int socket, socket_curr = cpu_to_socket(smp_processor_id());
> +    DECLARE_BITMAP(sockets, NR_CPUS);

Please avoid putting a 4095-bit bitmap on the stack.

> +
> +    bitmap_zero(sockets, NR_CPUS);
> +    if (socket_curr >= 0)

Coding style.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID
  2014-02-19  6:32 ` [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
@ 2014-02-24 14:26   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-02-24 14:26 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir, Ian.Campbell, andrew.cooper3, stefano.stabellini,
	Ian.Jackson, xen-devel, dgdegra

>>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> +void cqm_assoc_rmid(unsigned int rmid)
> +{
> +    uint64_t val;
> +    uint64_t new_val;
> +
> +    rdmsrl(MSR_IA32_PQR_ASSOC, val);
> +    new_val = (val & ~rmid_mask) | (rmid & rmid_mask);
> +    if ( val != new_val )
> +        wrmsrl(MSR_IA32_PQR_ASSOC, new_val);
> +}

Considering that even the addition of two RDMSRs in the context
switch path is relatively expensive, I think you will want to track the
most recently written value in a per-CPU variable.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-02-24 14:12   ` Jan Beulich
@ 2014-03-03 13:21     ` Xu, Dongxiao
  2014-03-04  8:10       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-03 13:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, February 24, 2014 10:13 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> > +struct pqos_cqm __read_mostly *cqm = NULL;
> 
> Misplaced __read_mostly (belongs after the *).

Okay.

> 
> > +static void __init init_cqm(void)
> > +{
> > +    unsigned int rmid;
> > +    unsigned int eax, edx;
> > +    unsigned int cqm_pages;
> > +    unsigned int i;
> > +
> > +    if ( !opt_cqm_max_rmid )
> > +        return;
> > +
> > +    cqm = xzalloc(struct pqos_cqm);
> > +    if ( !cqm )
> > +        return;
> > +
> > +    cpuid_count(0xf, 1, &eax, &cqm->upscaling_factor, &cqm->max_rmid,
> &edx);
> > +    if ( !(edx & QOS_MONITOR_EVTID_L3) )
> > +        goto out;
> > +
> > +    cqm->min_rmid = 1;
> > +    cqm->max_rmid = min(opt_cqm_max_rmid, cqm->max_rmid);
> > +
> > +    cqm->rmid_to_dom = xmalloc_array(domid_t, cqm->max_rmid + 1);
> > +    if ( !cqm->rmid_to_dom )
> > +        goto out;
> > +
> > +    /* Reserve RMID 0 for all domains not being monitored */
> > +    cqm->rmid_to_dom[0] = DOMID_XEN;
> > +    for ( rmid = cqm->min_rmid; rmid <= cqm->max_rmid; rmid++ )
> > +        cqm->rmid_to_dom[rmid] = DOMID_INVALID;
> > +
> > +    /* Allocate CQM buffer size in initialization stage */
> > +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
> > +                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/
> 
> Does this really need to be NR_CPUS (rather than nr_cpu_ids)?

Okay.
As you mentioned in later comment, the CQM data is indexed per-socket.
Here we use NR_CPUS or nr_cpu_ids because it is big enough to cover the possible socket number in the system (even consider hotplug case).
Is there a better way that we can get the system socket number (including the case even there is no CPU in that socket)?

> 
> > +                PAGE_SIZE + 1;
> > +    cqm->buffer_size = cqm_pages * PAGE_SIZE;
> > +
> > +    cqm->buffer =
> alloc_xenheap_pages(get_order_from_pages(cqm_pages), 0);
> 
> And does the allocation really need to be physically contiguous?
> If so - did you calculate how much more memory you allocate
> (due to the rounding up to the next power of 2), to decide
> whether it's worthwhile freeing the unused portion?

The buffer needs to be contiguous since userspace will map it as read-only to avoid data copy.

According to my rough calculation, the cqm_pages value is about 29 pages on my test machine.


> 
> > +    if ( !cqm->buffer )
> > +    {
> > +        xfree(cqm->rmid_to_dom);
> > +        goto out;
> > +    }
> > +    memset(cqm->buffer, 0, cqm->buffer_size);
> > +
> > +    for ( i = 0; i < cqm_pages; i++ )
> > +        share_xen_page_with_privileged_guests(
> > +            virt_to_page((void *)((unsigned long)cqm->buffer + i *
> PAGE_SIZE)),
> 
> virt_to_page((void *)cqm->buffer + i * PAGE_SIZE)

Okay.

> 
> > +static void __init init_qos_monitor(void)
> > +{
> > +    unsigned int qm_features;
> > +    unsigned int eax, ebx, ecx;
> > +
> > +    if ( !(boot_cpu_has(X86_FEATURE_QOSM)) )
> 
> Pointless pair of parentheses.

Okay.

> 
> > +        return;
> > +
> > +    cpuid_count(0xf, 0, &eax, &ebx, &ecx, &qm_features);
> > +
> > +    if ( opt_cqm && (qm_features & QOS_MONITOR_TYPE_L3) )
> > +        init_cqm();
> > +}
> > +
> > +void __init init_platform_qos(void)
> > +{
> > +    if ( !opt_pqos )
> > +        return;
> > +
> > +    init_qos_monitor();
> > +}
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > index b49256d..639528f 100644
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -48,6 +48,7 @@
> >  #include <asm/setup.h>
> >  #include <xen/cpu.h>
> >  #include <asm/nmi.h>
> > +#include <asm/pqos.h>
> >
> >  /* opt_nosmp: If true, secondary processors are ignored. */
> >  static bool_t __initdata opt_nosmp;
> > @@ -1419,6 +1420,8 @@ void __init __start_xen(unsigned long mbi_p)
> >
> >      domain_unpause_by_systemcontroller(dom0);
> >
> > +    init_platform_qos();
> > +
> >      reset_stack_and_jump(init_done);
> >  }
> >
> > diff --git a/xen/include/asm-x86/cpufeature.h
> b/xen/include/asm-x86/cpufeature.h
> > index 1cfaf94..ca59668 100644
> > --- a/xen/include/asm-x86/cpufeature.h
> > +++ b/xen/include/asm-x86/cpufeature.h
> > @@ -147,6 +147,7 @@
> >  #define X86_FEATURE_ERMS	(7*32+ 9) /* Enhanced REP MOVSB/STOSB */
> >  #define X86_FEATURE_INVPCID	(7*32+10) /* Invalidate Process Context
> ID */
> >  #define X86_FEATURE_RTM 	(7*32+11) /* Restricted Transactional
> Memory */
> > +#define X86_FEATURE_QOSM	(7*32+12) /* Platform QoS monitoring
> capability */
> >  #define X86_FEATURE_NO_FPU_SEL 	(7*32+13) /* FPU CS/DS stored as
> zero */
> >  #define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access
> Prevention */
> >
> > diff --git a/xen/include/asm-x86/pqos.h b/xen/include/asm-x86/pqos.h
> > new file mode 100644
> > index 0000000..0a8065c
> > --- /dev/null
> > +++ b/xen/include/asm-x86/pqos.h
> > @@ -0,0 +1,43 @@
> > +/*
> > + * pqos.h: Platform QoS related service for guest.
> > + *
> > + * Copyright (c) 2014, Intel Corporation
> > + * Author: Jiongxi Li  <jiongxi.li@intel.com>
> > + * Author: Dongxiao Xu <dongxiao.xu@intel.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License
> > for
> > + * more details.
> > + */
> > +#ifndef ASM_PQOS_H
> > +#define ASM_PQOS_H
> > +
> > +#include <public/xen.h>
> > +#include <xen/spinlock.h>
> > +
> > +/* QoS Resource Type Enumeration */
> > +#define QOS_MONITOR_TYPE_L3            0x2
> > +
> > +/* QoS Monitoring Event ID */
> > +#define QOS_MONITOR_EVTID_L3           0x1
> > +
> > +struct pqos_cqm {
> > +    spinlock_t cqm_lock;
> > +    uint64_t *buffer;
> > +    unsigned int min_rmid;
> > +    unsigned int max_rmid;
> > +    unsigned int used_rmid;
> > +    unsigned int upscaling_factor;
> > +    unsigned int buffer_size;
> > +    domid_t *rmid_to_dom;
> > +};
> > +extern struct pqos_cqm *cqm;
> > +
> > +void init_platform_qos(void);
> > +
> > +#endif
> > --
> > 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc
  2014-02-19  9:39   ` Ian Campbell
@ 2014-03-03 13:46     ` Xu, Dongxiao
  0 siblings, 0 replies; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-03 13:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir@xen.org, stefano.stabellini@eu.citrix.com,
	andrew.cooper3@citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, JBeulich@suse.com, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Wednesday, February 19, 2014 5:39 PM
> To: Xu, Dongxiao
> Cc: xen-devel@lists.xen.org; keir@xen.org; JBeulich@suse.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> andrew.cooper3@citrix.com; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov
> Subject: Re: [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for
> libxl/libxc
> 
> On Wed, 2014-02-19 at 14:32 +0800, Dongxiao Xu wrote:
> > +=item B<pqos-attach> [I<qos-type>] [I<domain-id>]
> > +
> > +Attach certain platform QoS service for a domain.
> > +Current supported I<qos-type> is: "cqm".
> 
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 12d6c31..f3d2202 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -1105,6 +1105,10 @@ int libxl_flask_getenforce(libxl_ctx *ctx);
> >  int libxl_flask_setenforce(libxl_ctx *ctx, int mode);
> >  int libxl_flask_loadpolicy(libxl_ctx *ctx, void *policy, uint32_t size);
> >
> > +int libxl_pqos_attach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);
> > +int libxl_pqos_detach(libxl_ctx *ctx, uint32_t domid, const char * qos_type);
> 
> I have a feeling that qos_type should actually be an enum in the IDL.
> The xl functions can probably use the autogenerate
> libxl_BLAH_from_string() functions to help with parsing.

Okay.


> 
> What other qos types are you envisaging? Is it valid to enable or
> disable multiple such things independently?

Yes, there are different types of QoS resources, and yes, they need to handle separately.

> 
> > +void libxl_map_cqm_buffer(libxl_ctx *ctx, libxl_cqminfo *xlinfo);
> 
> So each qos type is going to come with its own map function?

Yes.

> 
> I don't see the LIBXL_HAVE #define which we discussed last time anywhere
> here.

Oh, I mis-understood your previous comments.
Will add it in next patch version.

> 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 649ce50..43c0f48 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -596,3 +596,10 @@ libxl_event = Struct("event",[
> >                                   ])),
> >             ("domain_create_console_available", Struct(None, [])),
> >             ]))])
> > +
> > +libxl_cqminfo = Struct("cqminfo", [
> > +    ("buffer_virt",    uint64),
> 
> An opaque void * masquerading as an integer is not a suitable interface.
> 
> This should be a (pointer to a) struct of the appropriate type, or an
> Array of such types etc (or possibly several such arrays depending on
> what you are returning).
> 
> I haven't looked in detail into what is actually in this buffer, but
> please try and have libxl bake it into a more consumable form -- e.g. an
> array of per-domain properties or something rather than a raw list.

The buffer contains the data of L3 cache usage for a certain domain on certain socket, so there is no structure inside.

For defining it as an array, we don't know its size until the libxl_map_cqm_buffer() function got returned. :(

> 
> > +    ("size",           uint32),
> > +    ("nr_rmids",       uint32),
> > +    ("nr_sockets",     uint32),
> > +    ])
> > [...][
> > +static void print_cqm_info(const libxl_cqminfo *info)
> > +{
> > +    unsigned int i, j, k;
> > +    char *domname;
> > +    int print_header;
> > +    int cqm_domains = 0;
> > +    uint16_t *rmid_to_dom;
> > +    uint64_t *l3c_data;
> > +    uint32_t first_domain = 0;
> > +    unsigned int num_domains = 1024;
> > +
> > +    if (info->nr_rmids == 0) {
> > +        printf("System doesn't support CQM.\n");
> > +        return;
> > +    }
> > +
> > +    print_header = 1;
> > +    l3c_data = (uint64_t *)(info->buffer_virt);
> > +    rmid_to_dom = (uint16_t *)(info->buffer_virt +
> > +                  info->nr_sockets * info->nr_rmids * sizeof(uint64_t));
> > +    for (i = first_domain; i < (first_domain + num_domains); i++) {
> > +        for (j = 0; j < info->nr_rmids; j++) {
> > +            if (rmid_to_dom[j] != i)
> > +                continue;
> > +
> > +            if (print_header) {
> > +                printf("Name
> ID");
> > +                for (k = 0; k < info->nr_sockets; k++)
> > +                    printf("\tSocketID\tL3C_Usage");
> > +                print_header = 0;
> > +            }
> > +
> > +            domname = libxl_domid_to_name(ctx, i);
> > +            printf("\n%-40s %5d", domname, i);
> > +            free(domname);
> > +            cqm_domains++;
> > +
> > +            for (k = 0; k < info->nr_sockets; k++)
> > +                printf("%10u %16lu     ",
> > +                        k, l3c_data[info->nr_rmids * k + j]);
> > +        }
> 
> This should be transformed into a sensible interface within libxl so
> that it can be consumed in a straightforward manner by the users of
> libxl, rather than asking them all to reimplement this.

Okay.

> 	
> Is the buffer format considered a frozen ABI?

Yes, it is a fixed style for CQM.

> 
> > diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> > index ebe0220..d4af4a9 100644
> > --- a/tools/libxl/xl_cmdtable.c
> > +++ b/tools/libxl/xl_cmdtable.c
> > @@ -494,6 +494,21 @@ struct cmd_spec cmd_table[] = {
> >        "[options]",
> >        "-F                      Run in the foreground",
> >      },
> > +    { "pqos-attach",
> > +      &main_pqosattach, 0, 1,
> > +      "Allocate and map qos resource",
> > +      "<Resource> <Domain>",
> > +    },
> > +    { "pqos-detach",
> > +      &main_pqosdetach, 0, 1,
> > +      "Reliquish qos resource",
> 
> "Relinquish"
> 
> and perhaps "resources" (in both cases)

Okay.

Thanks,
Dongxiao

> 
> > +      "<Resource> <Domain>",
> > +    },
> > +    { "pqos-list",
> > +      &main_pqoslist, 0, 0,
> > +      "List qos information for all domains",
> > +      "<Resource>",
> > +    },
> >  };
> >
> >  int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-03 13:21     ` Xu, Dongxiao
@ 2014-03-04  8:10       ` Jan Beulich
  2014-03-18  2:02         ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-04  8:10 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 03.03.14 at 14:21, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
>> > +    /* Allocate CQM buffer size in initialization stage */
>> > +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
>> > +                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/
>> 
>> Does this really need to be NR_CPUS (rather than nr_cpu_ids)?
> 
> Okay.
> As you mentioned in later comment, the CQM data is indexed per-socket.
> Here we use NR_CPUS or nr_cpu_ids because it is big enough to cover the 
> possible socket number in the system (even consider hotplug case).
> Is there a better way that we can get the system socket number (including 
> the case even there is no CPU in that socket)?

I think we should at least get the estimation as close as possible:
Count the sockets that we know of (i.e. that have at least one
core/thread) and add the number of "disabled" (hot-pluggable)
CPUs if ACPI doesn't surface enough information to associate
them with a socket (but I think MADT provides all the needed data).

>> > +                PAGE_SIZE + 1;
>> > +    cqm->buffer_size = cqm_pages * PAGE_SIZE;
>> > +
>> > +    cqm->buffer =
>> alloc_xenheap_pages(get_order_from_pages(cqm_pages), 0);
>> 
>> And does the allocation really need to be physically contiguous?
>> If so - did you calculate how much more memory you allocate
>> (due to the rounding up to the next power of 2), to decide
>> whether it's worthwhile freeing the unused portion?
> 
> The buffer needs to be contiguous since userspace will map it as read-only to 
> avoid data copy.

That's not an argument for other that coding simplicity - user space
could equally well map an array of dis-contiguous MFNs.

> According to my rough calculation, the cqm_pages value is about 29 pages on 
> my test machine.

Your test machine doesn't count; what counts is the maximum
possible with the currently enforced limits.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-04  8:10       ` Jan Beulich
@ 2014-03-18  2:02         ` Xu, Dongxiao
  2014-03-18  9:57           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18  2:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, March 04, 2014 4:11 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 03.03.14 at 14:21, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> >> > +    /* Allocate CQM buffer size in initialization stage */
> >> > +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
> >> > +                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/
> >>
> >> Does this really need to be NR_CPUS (rather than nr_cpu_ids)?
> >
> > Okay.
> > As you mentioned in later comment, the CQM data is indexed per-socket.
> > Here we use NR_CPUS or nr_cpu_ids because it is big enough to cover the
> > possible socket number in the system (even consider hotplug case).
> > Is there a better way that we can get the system socket number (including
> > the case even there is no CPU in that socket)?
> 
> I think we should at least get the estimation as close as possible:
> Count the sockets that we know of (i.e. that have at least one
> core/thread) and add the number of "disabled" (hot-pluggable)
> CPUs if ACPI doesn't surface enough information to associate
> them with a socket (but I think MADT provides all the needed data).

It seems that MADT table doesn't contain the socket number information...

Considering that it is difficult to get the accurate socket number at system initialization time, what about we allocate/free the CQM related memory at runtime when user admin really issues the QoS query command?
With this approach, the data sharing between Xen and Dom0 tools should be much less since we know:
 - How many processor sockets are active in the system.
 - How many active RMIDs are in use in the system.

With above, we didn't need to always using the max_rmid * max_socket memory to share a lot of unnecessary "zero" data, and the new sharing data should be less than one page.
What's your opinion about that?

Thanks,
Dongxiao

> 
> >> > +                PAGE_SIZE + 1;
> >> > +    cqm->buffer_size = cqm_pages * PAGE_SIZE;
> >> > +
> >> > +    cqm->buffer =
> >> alloc_xenheap_pages(get_order_from_pages(cqm_pages), 0);
> >>
> >> And does the allocation really need to be physically contiguous?
> >> If so - did you calculate how much more memory you allocate
> >> (due to the rounding up to the next power of 2), to decide
> >> whether it's worthwhile freeing the unused portion?
> >
> > The buffer needs to be contiguous since userspace will map it as read-only to
> > avoid data copy.
> 
> That's not an argument for other that coding simplicity - user space
> could equally well map an array of dis-contiguous MFNs.
> 
> > According to my rough calculation, the cqm_pages value is about 29 pages on
> > my test machine.
> 
> Your test machine doesn't count; what counts is the maximum
> possible with the currently enforced limits.
> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18  2:02         ` Xu, Dongxiao
@ 2014-03-18  9:57           ` Jan Beulich
  2014-03-18 10:02             ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-18  9:57 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 03:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 03.03.14 at 14:21, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
>> >> > +    /* Allocate CQM buffer size in initialization stage */
>> >> > +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
>> >> > +                (cqm->max_rmid + 1) * sizeof(uint64_t) * NR_CPUS)/
>> >>
>> >> Does this really need to be NR_CPUS (rather than nr_cpu_ids)?
>> >
>> > Okay.
>> > As you mentioned in later comment, the CQM data is indexed per-socket.
>> > Here we use NR_CPUS or nr_cpu_ids because it is big enough to cover the
>> > possible socket number in the system (even consider hotplug case).
>> > Is there a better way that we can get the system socket number (including
>> > the case even there is no CPU in that socket)?
>> 
>> I think we should at least get the estimation as close as possible:
>> Count the sockets that we know of (i.e. that have at least one
>> core/thread) and add the number of "disabled" (hot-pluggable)
>> CPUs if ACPI doesn't surface enough information to associate
>> them with a socket (but I think MADT provides all the needed data).
> 
> It seems that MADT table doesn't contain the socket number information...
> 
> Considering that it is difficult to get the accurate socket number at system 
> initialization time, what about we allocate/free the CQM related memory at 
> runtime when user admin really issues the QoS query command?
> With this approach, the data sharing between Xen and Dom0 tools should be 
> much less since we know:
>  - How many processor sockets are active in the system.
>  - How many active RMIDs are in use in the system.
> 
> With above, we didn't need to always using the max_rmid * max_socket memory 
> to share a lot of unnecessary "zero" data, and the new sharing data should be 
> less than one page.
> What's your opinion about that?

If you can get this to work, that would seem like a pretty optimal
solution.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18  9:57           ` Jan Beulich
@ 2014-03-18 10:02             ` Xu, Dongxiao
  2014-03-18 10:09               ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18 10:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, March 18, 2014 5:57 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 18.03.14 at 03:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 03.03.14 at 14:21, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >>> On 19.02.14 at 07:32, Dongxiao Xu <dongxiao.xu@intel.com> wrote:
> >> >> > +    /* Allocate CQM buffer size in initialization stage */
> >> >> > +    cqm_pages = ((cqm->max_rmid + 1) * sizeof(domid_t) +
> >> >> > +                (cqm->max_rmid + 1) * sizeof(uint64_t) *
> NR_CPUS)/
> >> >>
> >> >> Does this really need to be NR_CPUS (rather than nr_cpu_ids)?
> >> >
> >> > Okay.
> >> > As you mentioned in later comment, the CQM data is indexed per-socket.
> >> > Here we use NR_CPUS or nr_cpu_ids because it is big enough to cover the
> >> > possible socket number in the system (even consider hotplug case).
> >> > Is there a better way that we can get the system socket number (including
> >> > the case even there is no CPU in that socket)?
> >>
> >> I think we should at least get the estimation as close as possible:
> >> Count the sockets that we know of (i.e. that have at least one
> >> core/thread) and add the number of "disabled" (hot-pluggable)
> >> CPUs if ACPI doesn't surface enough information to associate
> >> them with a socket (but I think MADT provides all the needed data).
> >
> > It seems that MADT table doesn't contain the socket number information...
> >
> > Considering that it is difficult to get the accurate socket number at system
> > initialization time, what about we allocate/free the CQM related memory at
> > runtime when user admin really issues the QoS query command?
> > With this approach, the data sharing between Xen and Dom0 tools should be
> > much less since we know:
> >  - How many processor sockets are active in the system.
> >  - How many active RMIDs are in use in the system.
> >
> > With above, we didn't need to always using the max_rmid * max_socket
> memory
> > to share a lot of unnecessary "zero" data, and the new sharing data should be
> > less than one page.
> > What's your opinion about that?
> 
> If you can get this to work, that would seem like a pretty optimal
> solution.

Previously due to the large amount of data, we use the method to let Xen share the CQM pages to Dom0 in read only way to avoid data copy.
However since the data amount is reduced a lot with above approach, do you think whether we still need to use this share way? Or like most of the other sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass the data? 

Thanks,
Dongxiao


> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:02             ` Xu, Dongxiao
@ 2014-03-18 10:09               ` Jan Beulich
  2014-03-18 10:15                 ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-18 10:09 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> Previously due to the large amount of data, we use the method to let Xen 
> share the CQM pages to Dom0 in read only way to avoid data copy.
> However since the data amount is reduced a lot with above approach, do you 
> think whether we still need to use this share way? Or like most of the other 
> sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass the 
> data? 

Iirc we're talking about a square table with each dimension being the
socket count. Considering an 8-node 4-socket system, that would
still be 1024 entries, i.e. exceeding a page in size. Hence I would
think that the read-only sharing approach might still be better. Of
course, if the data amount was smaller (and by so much that even
on huge systems it's no more than a page), that would be different.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:09               ` Jan Beulich
@ 2014-03-18 10:15                 ` Xu, Dongxiao
  2014-03-18 10:28                   ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18 10:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, March 18, 2014 6:09 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> > Previously due to the large amount of data, we use the method to let Xen
> > share the CQM pages to Dom0 in read only way to avoid data copy.
> > However since the data amount is reduced a lot with above approach, do you
> > think whether we still need to use this share way? Or like most of the other
> > sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass the
> > data?
> 
> Iirc we're talking about a square table with each dimension being the
> socket count. Considering an 8-node 4-socket system, that would
> still be 1024 entries, i.e. exceeding a page in size. Hence I would
> think that the read-only sharing approach might still be better. Of
> course, if the data amount was smaller (and by so much that even
> on huge systems it's no more than a page), that would be different.

Okay.

By using dynamic memory allocation and data sharing mechanism, we may need two hypercalls when Dom0 tool stack is querying CQM related info.
 - 1st hypercall is to let Xen allocate the memory and put CQM data there.
 - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM data and Xen can free the memory.

Does it sound reasonable to you?

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:15                 ` Xu, Dongxiao
@ 2014-03-18 10:28                   ` Jan Beulich
  2014-03-18 10:46                     ` Xu, Dongxiao
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-18 10:28 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, March 18, 2014 6:09 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> keir@xen.org 
>> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>> 
>> >>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> > Previously due to the large amount of data, we use the method to let Xen
>> > share the CQM pages to Dom0 in read only way to avoid data copy.
>> > However since the data amount is reduced a lot with above approach, do you
>> > think whether we still need to use this share way? Or like most of the 
> other
>> > sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass 
> the
>> > data?
>> 
>> Iirc we're talking about a square table with each dimension being the
>> socket count. Considering an 8-node 4-socket system, that would
>> still be 1024 entries, i.e. exceeding a page in size. Hence I would
>> think that the read-only sharing approach might still be better. Of
>> course, if the data amount was smaller (and by so much that even
>> on huge systems it's no more than a page), that would be different.
> 
> Okay.
> 
> By using dynamic memory allocation and data sharing mechanism, we may need 
> two hypercalls when Dom0 tool stack is querying CQM related info.
>  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
>  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM data 
> and Xen can free the memory.

Why would that memory ever need de-allocating?

Anyway, could you clarify again what amount of data we're
talking about, without me having to dig out the old patch series?

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:28                   ` Jan Beulich
@ 2014-03-18 10:46                     ` Xu, Dongxiao
  2014-03-18 10:51                       ` Andrew Cooper
  2014-03-18 10:58                       ` Jan Beulich
  0 siblings, 2 replies; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18 10:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, March 18, 2014 6:28 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Tuesday, March 18, 2014 6:09 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> >> keir@xen.org
> >> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> >> feature
> >>
> >> >>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> > Previously due to the large amount of data, we use the method to let Xen
> >> > share the CQM pages to Dom0 in read only way to avoid data copy.
> >> > However since the data amount is reduced a lot with above approach, do
> you
> >> > think whether we still need to use this share way? Or like most of the
> > other
> >> > sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass
> > the
> >> > data?
> >>
> >> Iirc we're talking about a square table with each dimension being the
> >> socket count. Considering an 8-node 4-socket system, that would
> >> still be 1024 entries, i.e. exceeding a page in size. Hence I would
> >> think that the read-only sharing approach might still be better. Of
> >> course, if the data amount was smaller (and by so much that even
> >> on huge systems it's no more than a page), that would be different.
> >
> > Okay.
> >
> > By using dynamic memory allocation and data sharing mechanism, we may
> need
> > two hypercalls when Dom0 tool stack is querying CQM related info.
> >  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
> >  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM data
> > and Xen can free the memory.
> 
> Why would that memory ever need de-allocating?
> 
> Anyway, could you clarify again what amount of data we're
> talking about, without me having to dig out the old patch series?

Originally we statically allocate memory in initialization time, and the size is "rmid_max * socket_max", which may be a very big value. As the propose in today's first mail, we can use the dynamic memory allocation as "rmid_inuse * socket_inuse" for CQM related pages when user is really issuing query operations, this can save the allocated memory size, because at that point, we know the exact socket number in the system and exact the in use RMID, and no need to predict them as maximum values.

Back to the above question why memory need de-allocating:
Since the rmid_inuse and socket_inuse may be changing from time to time, the allocating memory size will be different. Therefore we need to allocate them when user issues the hypercall, and then free them after the data digest is finished.

Not sure whether I stated the problem clearly for you.

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:46                     ` Xu, Dongxiao
@ 2014-03-18 10:51                       ` Andrew Cooper
  2014-03-18 14:44                         ` Xu, Dongxiao
  2014-03-18 10:58                       ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2014-03-18 10:51 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, Jan Beulich, dgdegra@tycho.nsa.gov

On 18/03/14 10:46, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, March 18, 2014 6:28 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> keir@xen.org
>> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>>
>>>>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>>>>  -----Original Message-----
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Tuesday, March 18, 2014 6:09 PM
>>>> To: Xu, Dongxiao
>>>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>>>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>>>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>>>> keir@xen.org
>>>> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
>>>> feature
>>>>
>>>>>>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>>>>> Previously due to the large amount of data, we use the method to let Xen
>>>>> share the CQM pages to Dom0 in read only way to avoid data copy.
>>>>> However since the data amount is reduced a lot with above approach, do
>> you
>>>>> think whether we still need to use this share way? Or like most of the
>>> other
>>>>> sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass
>>> the
>>>>> data?
>>>> Iirc we're talking about a square table with each dimension being the
>>>> socket count. Considering an 8-node 4-socket system, that would
>>>> still be 1024 entries, i.e. exceeding a page in size. Hence I would
>>>> think that the read-only sharing approach might still be better. Of
>>>> course, if the data amount was smaller (and by so much that even
>>>> on huge systems it's no more than a page), that would be different.
>>> Okay.
>>>
>>> By using dynamic memory allocation and data sharing mechanism, we may
>> need
>>> two hypercalls when Dom0 tool stack is querying CQM related info.
>>>  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
>>>  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM data
>>> and Xen can free the memory.
>> Why would that memory ever need de-allocating?
>>
>> Anyway, could you clarify again what amount of data we're
>> talking about, without me having to dig out the old patch series?
> Originally we statically allocate memory in initialization time, and the size is "rmid_max * socket_max", which may be a very big value. As the propose in today's first mail, we can use the dynamic memory allocation as "rmid_inuse * socket_inuse" for CQM related pages when user is really issuing query operations, this can save the allocated memory size, because at that point, we know the exact socket number in the system and exact the in use RMID, and no need to predict them as maximum values.
>
> Back to the above question why memory need de-allocating:
> Since the rmid_inuse and socket_inuse may be changing from time to time, the allocating memory size will be different. Therefore we need to allocate them when user issues the hypercall, and then free them after the data digest is finished.
>
> Not sure whether I stated the problem clearly for you.
>
> Thanks,
> Dongxiao

There is a sensible upper bound for rmid_max, in the init function. 
There should be a set of pointers (one per socket), allocated on use,
which can contain rmid_max data.

Once allocated, they are large enough for any eventuality, and don't
need de/reallocating. This way, the amount of memory used is
sockets_inuse * rmid_max.

~Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:46                     ` Xu, Dongxiao
  2014-03-18 10:51                       ` Andrew Cooper
@ 2014-03-18 10:58                       ` Jan Beulich
  2014-03-18 14:33                         ` Xu, Dongxiao
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2014-03-18 10:58 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 11:46, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> > By using dynamic memory allocation and data sharing mechanism, we may need
>> > two hypercalls when Dom0 tool stack is querying CQM related info.
>> >  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
>> >  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM data
>> > and Xen can free the memory.
>> 
>> Why would that memory ever need de-allocating?
>> 
>> Anyway, could you clarify again what amount of data we're
>> talking about, without me having to dig out the old patch series?
> 
> Originally we statically allocate memory in initialization time, and the 
> size is "rmid_max * socket_max", which may be a very big value. As the 
> propose in today's first mail, we can use the dynamic memory allocation as 
> "rmid_inuse * socket_inuse" for CQM related pages when user is really issuing 
> query operations, this can save the allocated memory size, because at that 
> point, we know the exact socket number in the system and exact the in use 
> RMID, and no need to predict them as maximum values.

That wasn't my question. The question (tailored to the above
description of yours) is - what are reasonable values of rmid_inuse
and socket_inuse on a huge system?

> Back to the above question why memory need de-allocating:
> Since the rmid_inuse and socket_inuse may be changing from time to time, the 
> allocating memory size will be different. Therefore we need to allocate them 
> when user issues the hypercall, and then free them after the data digest is 
> finished.

But once we have topology information in hand, deriving the
maximum possible socket number from MADT data ought to be
possible. Hence the allocation could be done with the maximum
size. An alternative (allowing the allocated size to be limited to
in-use resources) might be to do the de-allocation during CPU
hotplug rather than upon explicit Dom0 request.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:58                       ` Jan Beulich
@ 2014-03-18 14:33                         ` Xu, Dongxiao
  2014-03-18 15:26                           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18 14:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, March 18, 2014 6:58 PM
> To: Xu, Dongxiao
> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> >>> On 18.03.14 at 11:46, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >> > By using dynamic memory allocation and data sharing mechanism, we may
> need
> >> > two hypercalls when Dom0 tool stack is querying CQM related info.
> >> >  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
> >> >  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM
> data
> >> > and Xen can free the memory.
> >>
> >> Why would that memory ever need de-allocating?
> >>
> >> Anyway, could you clarify again what amount of data we're
> >> talking about, without me having to dig out the old patch series?
> >
> > Originally we statically allocate memory in initialization time, and the
> > size is "rmid_max * socket_max", which may be a very big value. As the
> > propose in today's first mail, we can use the dynamic memory allocation as
> > "rmid_inuse * socket_inuse" for CQM related pages when user is really issuing
> > query operations, this can save the allocated memory size, because at that
> > point, we know the exact socket number in the system and exact the in use
> > RMID, and no need to predict them as maximum values.
> 
> That wasn't my question. The question (tailored to the above
> description of yours) is - what are reasonable values of rmid_inuse
> and socket_inuse on a huge system?

According to SDM figure 17-20, the possible maximum of rmid should be about 2^10=1024. And for the rmid_inuse, it is somehow depending on the launched VM number, if administrator is interested in the LLC usage.
For socket number, I think 2/4 sockets are common in the market.

> 
> > Back to the above question why memory need de-allocating:
> > Since the rmid_inuse and socket_inuse may be changing from time to time, the
> > allocating memory size will be different. Therefore we need to allocate them
> > when user issues the hypercall, and then free them after the data digest is
> > finished.
> 
> But once we have topology information in hand, deriving the
> maximum possible socket number from MADT data ought to be
> possible. Hence the allocation could be done with the maximum
> size. An alternative (allowing the allocated size to be limited to
> in-use resources) might be to do the de-allocation during CPU
> hotplug rather than upon explicit Dom0 request.

System may be boot with no CPU plugged in the socket, so can we get the full map in this case? I think it may be difficult...

Thanks,
Dongxiao

> 
> Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 10:51                       ` Andrew Cooper
@ 2014-03-18 14:44                         ` Xu, Dongxiao
  2014-03-18 15:42                           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Xu, Dongxiao @ 2014-03-18 14:44 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, Jan Beulich, dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, March 18, 2014 6:51 PM
> To: Xu, Dongxiao
> Cc: Jan Beulich; Ian.Campbell@citrix.com; Ian.Jackson@eu.citrix.com;
> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> Subject: Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> feature
> 
> On 18/03/14 10:46, Xu, Dongxiao wrote:
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Tuesday, March 18, 2014 6:28 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> >> keir@xen.org
> >> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> >> feature
> >>
> >>>>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >>>>  -----Original Message-----
> >>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>> Sent: Tuesday, March 18, 2014 6:09 PM
> >>>> To: Xu, Dongxiao
> >>>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >>>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> >>>> xen-devel@lists.xen.org; konrad.wilk@oracle.com;
> dgdegra@tycho.nsa.gov;
> >>>> keir@xen.org
> >>>> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
> >>>> feature
> >>>>
> >>>>>>> On 18.03.14 at 11:02, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
> >>>>> Previously due to the large amount of data, we use the method to let Xen
> >>>>> share the CQM pages to Dom0 in read only way to avoid data copy.
> >>>>> However since the data amount is reduced a lot with above approach, do
> >> you
> >>>>> think whether we still need to use this share way? Or like most of the
> >>> other
> >>>>> sysctl/domctl (e.g., xl list) command, which uses copy_to_guest() to pass
> >>> the
> >>>>> data?
> >>>> Iirc we're talking about a square table with each dimension being the
> >>>> socket count. Considering an 8-node 4-socket system, that would
> >>>> still be 1024 entries, i.e. exceeding a page in size. Hence I would
> >>>> think that the read-only sharing approach might still be better. Of
> >>>> course, if the data amount was smaller (and by so much that even
> >>>> on huge systems it's no more than a page), that would be different.
> >>> Okay.
> >>>
> >>> By using dynamic memory allocation and data sharing mechanism, we may
> >> need
> >>> two hypercalls when Dom0 tool stack is querying CQM related info.
> >>>  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
> >>>  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM
> data
> >>> and Xen can free the memory.
> >> Why would that memory ever need de-allocating?
> >>
> >> Anyway, could you clarify again what amount of data we're
> >> talking about, without me having to dig out the old patch series?
> > Originally we statically allocate memory in initialization time, and the size is
> "rmid_max * socket_max", which may be a very big value. As the propose in
> today's first mail, we can use the dynamic memory allocation as "rmid_inuse *
> socket_inuse" for CQM related pages when user is really issuing query
> operations, this can save the allocated memory size, because at that point, we
> know the exact socket number in the system and exact the in use RMID, and no
> need to predict them as maximum values.
> >
> > Back to the above question why memory need de-allocating:
> > Since the rmid_inuse and socket_inuse may be changing from time to time, the
> allocating memory size will be different. Therefore we need to allocate them
> when user issues the hypercall, and then free them after the data digest is
> finished.
> >
> > Not sure whether I stated the problem clearly for you.
> >
> > Thanks,
> > Dongxiao
> 
> There is a sensible upper bound for rmid_max, in the init function.
> There should be a set of pointers (one per socket), allocated on use,
> which can contain rmid_max data.
> 
> Once allocated, they are large enough for any eventuality, and don't
> need de/reallocating. This way, the amount of memory used is
> sockets_inuse * rmid_max.

Hmm, this might be a good proposal, since according to the SDM (figure 17-20), the possible maximum rmid value may be 2^10=1024.
Therefore for each socket, 2 pages are enough. (1024 * 8 = 8192)
But we may need to track such per-socket structure in CPU online/offline logic.

Jan, what's your opinion?

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 14:33                         ` Xu, Dongxiao
@ 2014-03-18 15:26                           ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-03-18 15:26 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, andrew.cooper3@citrix.com,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 15:33, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, March 18, 2014 6:58 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
>> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
>> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
>> keir@xen.org 
>> Subject: RE: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>> 
>> >>> On 18.03.14 at 11:46, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >>> On 18.03.14 at 11:15, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> >> > By using dynamic memory allocation and data sharing mechanism, we may
>> need
>> >> > two hypercalls when Dom0 tool stack is querying CQM related info.
>> >> >  - 1st hypercall is to let Xen allocate the memory and put CQM data there.
>> >> >  - 2nd hypercall is to indicate Dom0 tool stack already digested the CQM
>> data
>> >> > and Xen can free the memory.
>> >>
>> >> Why would that memory ever need de-allocating?
>> >>
>> >> Anyway, could you clarify again what amount of data we're
>> >> talking about, without me having to dig out the old patch series?
>> >
>> > Originally we statically allocate memory in initialization time, and the
>> > size is "rmid_max * socket_max", which may be a very big value. As the
>> > propose in today's first mail, we can use the dynamic memory allocation as
>> > "rmid_inuse * socket_inuse" for CQM related pages when user is really 
> issuing
>> > query operations, this can save the allocated memory size, because at that
>> > point, we know the exact socket number in the system and exact the in use
>> > RMID, and no need to predict them as maximum values.
>> 
>> That wasn't my question. The question (tailored to the above
>> description of yours) is - what are reasonable values of rmid_inuse
>> and socket_inuse on a huge system?
> 
> According to SDM figure 17-20, the possible maximum of rmid should be about 
> 2^10=1024. And for the rmid_inuse, it is somehow depending on the launched VM 
> number, if administrator is interested in the LLC usage.
> For socket number, I think 2/4 sockets are common in the market.

That's (a) not considering huge systems and (b) not even considering
multi-node ones.

>> > Back to the above question why memory need de-allocating:
>> > Since the rmid_inuse and socket_inuse may be changing from time to time, 
> the
>> > allocating memory size will be different. Therefore we need to allocate 
> them
>> > when user issues the hypercall, and then free them after the data digest is
>> > finished.
>> 
>> But once we have topology information in hand, deriving the
>> maximum possible socket number from MADT data ought to be
>> possible. Hence the allocation could be done with the maximum
>> size. An alternative (allowing the allocated size to be limited to
>> in-use resources) might be to do the de-allocation during CPU
>> hotplug rather than upon explicit Dom0 request.
> 
> System may be boot with no CPU plugged in the socket, so can we get the full 
> map in this case? I think it may be difficult...

Afaik the full topology has to be available, with hot-pluggable but
not present CPUs marked accordingly. We use this information to
e.g. set nr_cpu_ids.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature
  2014-03-18 14:44                         ` Xu, Dongxiao
@ 2014-03-18 15:42                           ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2014-03-18 15:42 UTC (permalink / raw)
  To: Dongxiao Xu
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Andrew Cooper,
	stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com,
	xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov

>>> On 18.03.14 at 15:44, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> There is a sensible upper bound for rmid_max, in the init function.
>> There should be a set of pointers (one per socket), allocated on use,
>> which can contain rmid_max data.
>> 
>> Once allocated, they are large enough for any eventuality, and don't
>> need de/reallocating. This way, the amount of memory used is
>> sockets_inuse * rmid_max.
> 
> Hmm, this might be a good proposal, since according to the SDM (figure 
> 17-20), the possible maximum rmid value may be 2^10=1024.
> Therefore for each socket, 2 pages are enough. (1024 * 8 = 8192)
> But we may need to track such per-socket structure in CPU online/offline 
> logic.
> 
> Jan, what's your opinion?

Sounds reasonable if you can make it work cleanly.

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-03-18 15:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-19  6:32 [PATCH v9 0/6] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
2014-02-19  6:32 ` [PATCH v9 1/6] x86: detect and initialize Cache QoS Monitoring feature Dongxiao Xu
2014-02-24 14:12   ` Jan Beulich
2014-03-03 13:21     ` Xu, Dongxiao
2014-03-04  8:10       ` Jan Beulich
2014-03-18  2:02         ` Xu, Dongxiao
2014-03-18  9:57           ` Jan Beulich
2014-03-18 10:02             ` Xu, Dongxiao
2014-03-18 10:09               ` Jan Beulich
2014-03-18 10:15                 ` Xu, Dongxiao
2014-03-18 10:28                   ` Jan Beulich
2014-03-18 10:46                     ` Xu, Dongxiao
2014-03-18 10:51                       ` Andrew Cooper
2014-03-18 14:44                         ` Xu, Dongxiao
2014-03-18 15:42                           ` Jan Beulich
2014-03-18 10:58                       ` Jan Beulich
2014-03-18 14:33                         ` Xu, Dongxiao
2014-03-18 15:26                           ` Jan Beulich
2014-02-19  6:32 ` [PATCH v9 2/6] x86: dynamically attach/detach CQM service for a guest Dongxiao Xu
2014-02-24 14:15   ` Jan Beulich
2014-02-19  6:32 ` [PATCH v9 3/6] x86: collect CQM information from all sockets Dongxiao Xu
2014-02-24 14:23   ` Jan Beulich
2014-02-19  6:32 ` [PATCH v9 4/6] x86: enable CQM monitoring for each domain RMID Dongxiao Xu
2014-02-24 14:26   ` Jan Beulich
2014-02-19  6:32 ` [PATCH v9 5/6] xsm: add platform QoS related xsm policies Dongxiao Xu
2014-02-19  6:32 ` [PATCH v9 6/6] tools: enable Cache QoS Monitoring feature for libxl/libxc Dongxiao Xu
2014-02-19  9:39   ` Ian Campbell
2014-03-03 13:46     ` Xu, Dongxiao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).