xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86: support further Intel CPU families
@ 2014-03-05 10:34 Jan Beulich
  2014-03-05 10:36 ` [PATCH 1/3] " Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 10:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

1: x86: Intel CPU family update
2: x86/idle: update to include further package/core residency MSRs
3: xenpm: use new Cx statistics interface

Signed-off-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/3] x86: support further Intel CPU families
  2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
@ 2014-03-05 10:36 ` Jan Beulich
  2014-03-18  2:44   ` Tian, Kevin
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

... according to revision 49 of the Intel SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Intel: Clarification is needed that I correctly resolved the ambiguity
the manual has for 06_4D: Table 35-1 lists this among the Silvermont
ones and uses 06_4E for Future Generation Intel Core; section 35.1 and
table 35-24, however, use 06_4D throughout. My take is that the latter
is what is wrong.

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -139,6 +139,9 @@ static void do_get_hw_residencies(void *
     case 0x3F:
     case 0x45:
     case 0x46:
+    /* future */
+    case 0x3D:
+    case 0x4E:
         GET_PC2_RES(hw_res->pc2);
         GET_CC7_RES(hw_res->cc7);
         /* fall through */
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1966,10 +1966,14 @@ static const struct lbr_info *last_branc
         case 58: case 62:
         /* Haswell */
         case 60: case 63: case 69: case 70:
+        /* future */
+        case 61: case 78:
             return nh_lbr;
             break;
         /* Atom */
-        case 28:
+        case 28: case 38: case 39: case 53: case 54:
+        /* Silvermont */
+        case 55: case 74: case 77: case 90: case 93:
             return at_lbr;
             break;
         }
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -916,6 +916,10 @@ int vmx_vpmu_initialise(struct vcpu *v, 
         case 0x3f:
         case 0x45:
         case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
             ret = core2_vpmu_initialise(v, vpmu_flags);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;




[-- Attachment #2: x86-Intel-families.patch --]
[-- Type: text/plain, Size: 1771 bytes --]

x86: Intel CPU family update

... according to revision 49 of the Intel SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Intel: Clarification is needed that I correctly resolved the ambiguity
the manual has for 06_4D: Table 35-1 lists this among the Silvermont
ones and uses 06_4E for Future Generation Intel Core; section 35.1 and
table 35-24, however, use 06_4D throughout. My take is that the latter
is what is wrong.

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -139,6 +139,9 @@ static void do_get_hw_residencies(void *
     case 0x3F:
     case 0x45:
     case 0x46:
+    /* future */
+    case 0x3D:
+    case 0x4E:
         GET_PC2_RES(hw_res->pc2);
         GET_CC7_RES(hw_res->cc7);
         /* fall through */
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1966,10 +1966,14 @@ static const struct lbr_info *last_branc
         case 58: case 62:
         /* Haswell */
         case 60: case 63: case 69: case 70:
+        /* future */
+        case 61: case 78:
             return nh_lbr;
             break;
         /* Atom */
-        case 28:
+        case 28: case 38: case 39: case 53: case 54:
+        /* Silvermont */
+        case 55: case 74: case 77: case 90: case 93:
             return at_lbr;
             break;
         }
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -916,6 +916,10 @@ int vmx_vpmu_initialise(struct vcpu *v, 
         case 0x3f:
         case 0x45:
         case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
             ret = core2_vpmu_initialise(v, vpmu_flags);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
  2014-03-05 10:36 ` [PATCH 1/3] " Jan Beulich
@ 2014-03-05 10:37 ` Jan Beulich
  2014-03-05 10:42   ` Jan Beulich
                     ` (3 more replies)
  2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 10:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

[-- Attachment #1: Type: text/plain, Size: 13908 bytes --]

With the number of these growing it becomes increasingly desirable to
not repeatedly alter the sysctl interface to accommodate them. Replace
the explicit listing of numbered states by arrays, unused fields of
which will remain untouched by the hypercall.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
@@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
 
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
 {
-    DECLARE_SYSCTL;
-    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    uint64_t pc[7], cc[7];
+    struct xc_cx_stat_v2 cxpt2 = {
+        .triggers = cxpt->triggers,
+        .residencies = cxpt->residencies,
+        .nr_pc = sizeof(pc) / sizeof(*pc),
+        .nr_cc = sizeof(cc) / sizeof(*cc),
+        .pc = pc,
+        .cc = cc
+    };
     int max_cx, ret;
 
     if( !cxpt->triggers || !cxpt->residencies )
         return -EINVAL;
 
     if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
-        goto unlock_0;
+        return ret;
 
-    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
-    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
+    cxpt2.nr = max_cx;
+    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
+
+    cxpt->nr = cxpt2.nr;
+    cxpt->last = cxpt2.last;
+    cxpt->idle_time = cxpt2.idle_time;
+    cxpt->pc2 = pc[1];
+    cxpt->pc3 = pc[2];
+    cxpt->pc6 = pc[5];
+    cxpt->pc7 = pc[6];
+    cxpt->cc3 = cc[2];
+    cxpt->cc6 = cc[5];
+    cxpt->cc7 = cc[6];
+
+    return ret;
+}
+
+int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *cxpt)
+{
+    DECLARE_SYSCTL;
+    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
+                                   cxpt->nr * sizeof(*cxpt->triggers),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
+                                   cxpt->nr * sizeof(*cxpt->residencies),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
+                                   cxpt->nr_pc * sizeof(*cxpt->pc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
+                                   cxpt->nr_cc * sizeof(*cxpt->cc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret = -1;
 
-    ret = -1;
     if ( xc_hypercall_bounce_pre(xch, triggers) )
         goto unlock_0;
     if ( xc_hypercall_bounce_pre(xch, residencies) )
         goto unlock_1;
+    if ( xc_hypercall_bounce_pre(xch, pc) )
+        goto unlock_2;
+    if ( xc_hypercall_bounce_pre(xch, cc) )
+        goto unlock_3;
 
     sysctl.cmd = XEN_SYSCTL_get_pmstat;
     sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
     sysctl.u.get_pmstat.cpuid = cpuid;
+    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
+    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
+    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
 
     if ( (ret = xc_sysctl(xch, &sysctl)) )
-        goto unlock_2;
+        goto unlock_4;
 
     cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
     cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
     cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
-    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
-    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
-    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
-    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
-    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
-    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
-    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
+    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
+    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
 
+unlock_4:
+    xc_hypercall_bounce_post(xch, cc);
+unlock_3:
+    xc_hypercall_bounce_post(xch, pc);
 unlock_2:
     xc_hypercall_bounce_post(xch, residencies);
 unlock_1:
--- 2014-02-13.orig/tools/libxc/xenctrl.h	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/tools/libxc/xenctrl.h	2014-03-04 17:50:49.000000000 +0100
@@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch, 
 int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat *pxpt);
 int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
 
-struct xc_cx_stat {
+struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
     uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
     uint32_t last;         /* last Cx state */
     uint64_t idle_time;    /* idle time from boot */
@@ -1950,8 +1950,22 @@ struct xc_cx_stat {
 };
 typedef struct xc_cx_stat xc_cx_stat_t;
 
+struct xc_cx_stat_v2 {
+    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
+    uint32_t last;         /* last Cx state */
+    uint64_t idle_time;    /* idle time from boot */
+    uint64_t *triggers;    /* Cx trigger counts */
+    uint64_t *residencies; /* Cx residencies */
+    uint32_t nr_pc;        /* entry nr in pc[] */
+    uint32_t nr_cc;        /* entry nr in cc[] */
+    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
+    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
+};
+typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
+
 int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt);
+int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *);
 int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
 
 int xc_cpu_online(xc_interface *xch, int cpu);
--- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
@@ -62,13 +62,17 @@
 
 #define GET_HW_RES_IN_NS(msr, val) \
     do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
-#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
+#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
 #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
 #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
 #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
+#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
+#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
+#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
+#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
 #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
 #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
-#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
+#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
 
 static void lapic_timer_nop(void) { }
 void (*__read_mostly lapic_timer_off)(void);
@@ -111,8 +115,13 @@ struct hw_residencies
 {
     uint64_t pc2;
     uint64_t pc3;
+    uint64_t pc4;
     uint64_t pc6;
     uint64_t pc7;
+    uint64_t pc8;
+    uint64_t pc9;
+    uint64_t pc10;
+    uint64_t cc1;
     uint64_t cc3;
     uint64_t cc6;
     uint64_t cc7;
@@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
 
     switch ( c->x86_model )
     {
+    /* 4th generation Intel Core (Haswell) */
+    case 0x45:
+        GET_PC8_RES(hw_res->pc8);
+        GET_PC9_RES(hw_res->pc9);
+        GET_PC10_RES(hw_res->pc10);
+        /* fall through */
     /* Sandy bridge */
     case 0x2A:
     case 0x2D:
@@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
     /* Haswell */
     case 0x3C:
     case 0x3F:
-    case 0x45:
     case 0x46:
     /* future */
     case 0x3D:
@@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
         GET_CC3_RES(hw_res->cc3);
         GET_CC6_RES(hw_res->cc6);
         break;
+    /* various Atoms */
+    case 0x27:
+        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
+        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        break;
+    /* Silvermont */
+    case 0x37:
+    case 0x4A:
+    case 0x4D:
+    case 0x5A:
+    case 0x5D:
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        GET_CC1_RES(hw_res->cc1);
+        GET_CC6_RES(hw_res->cc6);
+        break;
     }
 }
 
@@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
 
     get_hw_residencies(cpu, &hw_res);
 
-    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
-           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
-    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
-           hw_res.cc3, hw_res.cc6,hw_res.cc7);
+    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
+           hw_res.pc2,
+           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
+           hw_res.pc6, hw_res.pc7);
+    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
+        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
+               hw_res.pc8, hw_res.pc9, hw_res.pc10);
+    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
+           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
+           hw_res.cc6, hw_res.cc7);
 }
 
 static char* acpi_cstate_method_name[] =
@@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
     struct acpi_processor_power *power = processor_powers[cpuid];
     uint64_t idle_usage = 0, idle_res = 0;
     uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
-    int i;
-    struct hw_residencies hw_res;
+    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
 
     if ( power == NULL )
     {
         stat->last = 0;
         stat->nr = 0;
         stat->idle_time = 0;
+        stat->nr_pc = 0;
+        stat->nr_cc = 0;
         return 0;
     }
 
     stat->last = power->last_state ? power->last_state->idx : 0;
     stat->idle_time = get_cpu_idle_time(cpuid);
+    nr = min(stat->nr, power->count);
 
     /* mimic the stat when detail info hasn't been registered by dom0 */
     if ( pm_idle_save == NULL )
@@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
 
         usage[1] = idle_usage = 1;
         res[1] = idle_res = stat->idle_time;
-
-        memset(&hw_res, 0, sizeof(hw_res));
     }
     else
     {
+        struct hw_residencies hw_res;
+
         stat->nr = power->count;
 
-        for ( i = 1; i < power->count; i++ )
+        for ( i = 1; i < nr; i++ )
         {
             spin_lock_irq(&power->stat_lock);
             usage[i] = power->states[i].usage;
@@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
         }
 
         get_hw_residencies(cpuid, &hw_res);
+
+#define PUT_xC(what, n) do { \
+        if ( stat->nr_##what >= n && \
+             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
+            return -EFAULT; \
+        if ( hw_res.what##n ) \
+            nr_##what = n; \
+    } while ( 0 )
+#define PUT_PC(n) PUT_xC(pc, n)
+        PUT_PC(2);
+        PUT_PC(3);
+        PUT_PC(4);
+        PUT_PC(6);
+        PUT_PC(7);
+        PUT_PC(8);
+        PUT_PC(9);
+        PUT_PC(10);
+#undef PUT_PC
+#define PUT_CC(n) PUT_xC(cc, n)
+        PUT_CC(1);
+        PUT_CC(3);
+        PUT_CC(6);
+        PUT_CC(7);
+#undef PUT_CC
+#undef PUT_xC
     }
 
     usage[0] = idle_usage;
     res[0] = NOW() - idle_res;
 
-    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
-         copy_to_guest(stat->residencies, res, stat->nr) )
+    if ( copy_to_guest(stat->triggers, usage, nr) ||
+         copy_to_guest(stat->residencies, res, nr) )
         return -EFAULT;
 
-    stat->pc2 = hw_res.pc2;
-    stat->pc3 = hw_res.pc3;
-    stat->pc6 = hw_res.pc6;
-    stat->pc7 = hw_res.pc7;
-    stat->cc3 = hw_res.cc3;
-    stat->cc6 = hw_res.cc6;
-    stat->cc7 = hw_res.cc7;
+    stat->nr_pc = nr_pc;
+    stat->nr_cc = nr_cc;
 
     return 0;
 }
--- 2014-02-13.orig/xen/include/public/sysctl.h	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
 
 /*
  * Read console content from Xen buffer ring.
@@ -226,13 +226,10 @@ struct pm_cx_stat {
     uint64_aligned_t idle_time;                 /* idle time from boot */
     XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
     XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
-    uint64_aligned_t pc2;
-    uint64_aligned_t pc3;
-    uint64_aligned_t pc6;
-    uint64_aligned_t pc7;
-    uint64_aligned_t cc3;
-    uint64_aligned_t cc6;
-    uint64_aligned_t cc7;
+    uint32_t nr_pc;                          /* entry nr in pc[] */
+    uint32_t nr_cc;                          /* entry nr in cc[] */
+    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
+    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
 };
 
 struct xen_sysctl_get_pmstat {



[-- Attachment #2: x86-Intel-idle-residencies.patch --]
[-- Type: text/plain, Size: 13971 bytes --]

x86/idle: update to include further package/core residency MSRs

With the number of these growing it becomes increasingly desirable to
not repeatedly alter the sysctl interface to accommodate them. Replace
the explicit listing of numbered states by arrays, unused fields of
which will remain untouched by the hypercall.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
@@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
 
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
 {
-    DECLARE_SYSCTL;
-    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+    uint64_t pc[7], cc[7];
+    struct xc_cx_stat_v2 cxpt2 = {
+        .triggers = cxpt->triggers,
+        .residencies = cxpt->residencies,
+        .nr_pc = sizeof(pc) / sizeof(*pc),
+        .nr_cc = sizeof(cc) / sizeof(*cc),
+        .pc = pc,
+        .cc = cc
+    };
     int max_cx, ret;
 
     if( !cxpt->triggers || !cxpt->residencies )
         return -EINVAL;
 
     if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
-        goto unlock_0;
+        return ret;
 
-    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
-    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
+    cxpt2.nr = max_cx;
+    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
+
+    cxpt->nr = cxpt2.nr;
+    cxpt->last = cxpt2.last;
+    cxpt->idle_time = cxpt2.idle_time;
+    cxpt->pc2 = pc[1];
+    cxpt->pc3 = pc[2];
+    cxpt->pc6 = pc[5];
+    cxpt->pc7 = pc[6];
+    cxpt->cc3 = cc[2];
+    cxpt->cc6 = cc[5];
+    cxpt->cc7 = cc[6];
+
+    return ret;
+}
+
+int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *cxpt)
+{
+    DECLARE_SYSCTL;
+    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
+                                   cxpt->nr * sizeof(*cxpt->triggers),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
+                                   cxpt->nr * sizeof(*cxpt->residencies),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
+                                   cxpt->nr_pc * sizeof(*cxpt->pc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
+                                   cxpt->nr_cc * sizeof(*cxpt->cc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret = -1;
 
-    ret = -1;
     if ( xc_hypercall_bounce_pre(xch, triggers) )
         goto unlock_0;
     if ( xc_hypercall_bounce_pre(xch, residencies) )
         goto unlock_1;
+    if ( xc_hypercall_bounce_pre(xch, pc) )
+        goto unlock_2;
+    if ( xc_hypercall_bounce_pre(xch, cc) )
+        goto unlock_3;
 
     sysctl.cmd = XEN_SYSCTL_get_pmstat;
     sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
     sysctl.u.get_pmstat.cpuid = cpuid;
+    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
+    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
+    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
 
     if ( (ret = xc_sysctl(xch, &sysctl)) )
-        goto unlock_2;
+        goto unlock_4;
 
     cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
     cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
     cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
-    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
-    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
-    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
-    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
-    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
-    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
-    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
+    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
+    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
 
+unlock_4:
+    xc_hypercall_bounce_post(xch, cc);
+unlock_3:
+    xc_hypercall_bounce_post(xch, pc);
 unlock_2:
     xc_hypercall_bounce_post(xch, residencies);
 unlock_1:
--- 2014-02-13.orig/tools/libxc/xenctrl.h	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/tools/libxc/xenctrl.h	2014-03-04 17:50:49.000000000 +0100
@@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch, 
 int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat *pxpt);
 int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
 
-struct xc_cx_stat {
+struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
     uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
     uint32_t last;         /* last Cx state */
     uint64_t idle_time;    /* idle time from boot */
@@ -1950,8 +1950,22 @@ struct xc_cx_stat {
 };
 typedef struct xc_cx_stat xc_cx_stat_t;
 
+struct xc_cx_stat_v2 {
+    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
+    uint32_t last;         /* last Cx state */
+    uint64_t idle_time;    /* idle time from boot */
+    uint64_t *triggers;    /* Cx trigger counts */
+    uint64_t *residencies; /* Cx residencies */
+    uint32_t nr_pc;        /* entry nr in pc[] */
+    uint32_t nr_cc;        /* entry nr in cc[] */
+    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
+    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
+};
+typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
+
 int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt);
+int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *);
 int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
 
 int xc_cpu_online(xc_interface *xch, int cpu);
--- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
@@ -62,13 +62,17 @@
 
 #define GET_HW_RES_IN_NS(msr, val) \
     do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
-#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
+#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
 #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
 #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
 #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
+#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
+#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
+#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
+#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
 #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
 #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
-#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
+#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
 
 static void lapic_timer_nop(void) { }
 void (*__read_mostly lapic_timer_off)(void);
@@ -111,8 +115,13 @@ struct hw_residencies
 {
     uint64_t pc2;
     uint64_t pc3;
+    uint64_t pc4;
     uint64_t pc6;
     uint64_t pc7;
+    uint64_t pc8;
+    uint64_t pc9;
+    uint64_t pc10;
+    uint64_t cc1;
     uint64_t cc3;
     uint64_t cc6;
     uint64_t cc7;
@@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
 
     switch ( c->x86_model )
     {
+    /* 4th generation Intel Core (Haswell) */
+    case 0x45:
+        GET_PC8_RES(hw_res->pc8);
+        GET_PC9_RES(hw_res->pc9);
+        GET_PC10_RES(hw_res->pc10);
+        /* fall through */
     /* Sandy bridge */
     case 0x2A:
     case 0x2D:
@@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
     /* Haswell */
     case 0x3C:
     case 0x3F:
-    case 0x45:
     case 0x46:
     /* future */
     case 0x3D:
@@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
         GET_CC3_RES(hw_res->cc3);
         GET_CC6_RES(hw_res->cc6);
         break;
+    /* various Atoms */
+    case 0x27:
+        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
+        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        break;
+    /* Silvermont */
+    case 0x37:
+    case 0x4A:
+    case 0x4D:
+    case 0x5A:
+    case 0x5D:
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        GET_CC1_RES(hw_res->cc1);
+        GET_CC6_RES(hw_res->cc6);
+        break;
     }
 }
 
@@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
 
     get_hw_residencies(cpu, &hw_res);
 
-    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
-           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
-    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
-           hw_res.cc3, hw_res.cc6,hw_res.cc7);
+    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
+           hw_res.pc2,
+           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
+           hw_res.pc6, hw_res.pc7);
+    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
+        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
+               hw_res.pc8, hw_res.pc9, hw_res.pc10);
+    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
+           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
+           hw_res.cc6, hw_res.cc7);
 }
 
 static char* acpi_cstate_method_name[] =
@@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
     struct acpi_processor_power *power = processor_powers[cpuid];
     uint64_t idle_usage = 0, idle_res = 0;
     uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
-    int i;
-    struct hw_residencies hw_res;
+    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
 
     if ( power == NULL )
     {
         stat->last = 0;
         stat->nr = 0;
         stat->idle_time = 0;
+        stat->nr_pc = 0;
+        stat->nr_cc = 0;
         return 0;
     }
 
     stat->last = power->last_state ? power->last_state->idx : 0;
     stat->idle_time = get_cpu_idle_time(cpuid);
+    nr = min(stat->nr, power->count);
 
     /* mimic the stat when detail info hasn't been registered by dom0 */
     if ( pm_idle_save == NULL )
@@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
 
         usage[1] = idle_usage = 1;
         res[1] = idle_res = stat->idle_time;
-
-        memset(&hw_res, 0, sizeof(hw_res));
     }
     else
     {
+        struct hw_residencies hw_res;
+
         stat->nr = power->count;
 
-        for ( i = 1; i < power->count; i++ )
+        for ( i = 1; i < nr; i++ )
         {
             spin_lock_irq(&power->stat_lock);
             usage[i] = power->states[i].usage;
@@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
         }
 
         get_hw_residencies(cpuid, &hw_res);
+
+#define PUT_xC(what, n) do { \
+        if ( stat->nr_##what >= n && \
+             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
+            return -EFAULT; \
+        if ( hw_res.what##n ) \
+            nr_##what = n; \
+    } while ( 0 )
+#define PUT_PC(n) PUT_xC(pc, n)
+        PUT_PC(2);
+        PUT_PC(3);
+        PUT_PC(4);
+        PUT_PC(6);
+        PUT_PC(7);
+        PUT_PC(8);
+        PUT_PC(9);
+        PUT_PC(10);
+#undef PUT_PC
+#define PUT_CC(n) PUT_xC(cc, n)
+        PUT_CC(1);
+        PUT_CC(3);
+        PUT_CC(6);
+        PUT_CC(7);
+#undef PUT_CC
+#undef PUT_xC
     }
 
     usage[0] = idle_usage;
     res[0] = NOW() - idle_res;
 
-    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
-         copy_to_guest(stat->residencies, res, stat->nr) )
+    if ( copy_to_guest(stat->triggers, usage, nr) ||
+         copy_to_guest(stat->residencies, res, nr) )
         return -EFAULT;
 
-    stat->pc2 = hw_res.pc2;
-    stat->pc3 = hw_res.pc3;
-    stat->pc6 = hw_res.pc6;
-    stat->pc7 = hw_res.pc7;
-    stat->cc3 = hw_res.cc3;
-    stat->cc6 = hw_res.cc6;
-    stat->cc7 = hw_res.cc7;
+    stat->nr_pc = nr_pc;
+    stat->nr_cc = nr_cc;
 
     return 0;
 }
--- 2014-02-13.orig/xen/include/public/sysctl.h	2014-03-04 17:43:06.000000000 +0100
+++ 2014-02-13/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
 
 /*
  * Read console content from Xen buffer ring.
@@ -226,13 +226,10 @@ struct pm_cx_stat {
     uint64_aligned_t idle_time;                 /* idle time from boot */
     XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
     XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
-    uint64_aligned_t pc2;
-    uint64_aligned_t pc3;
-    uint64_aligned_t pc6;
-    uint64_aligned_t pc7;
-    uint64_aligned_t cc3;
-    uint64_aligned_t cc6;
-    uint64_aligned_t cc7;
+    uint32_t nr_pc;                          /* entry nr in pc[] */
+    uint32_t nr_cc;                          /* entry nr in cc[] */
+    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
+    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
 };
 
 struct xen_sysctl_get_pmstat {

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
  2014-03-05 10:36 ` [PATCH 1/3] " Jan Beulich
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
@ 2014-03-05 10:37 ` Jan Beulich
  2014-03-05 15:47   ` Boris Ostrovsky
                     ` (2 more replies)
  2014-03-12  9:38 ` Ping: [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
  2014-03-17 13:28 ` [PATCH v2 0/2] " Jan Beulich
  4 siblings, 3 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 10:37 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

[-- Attachment #1: Type: text/plain, Size: 8391 bytes --]

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -29,6 +29,9 @@
 #include <inttypes.h>
 #include <sys/time.h>
 
+#define MAX_PKG_RESIDENCIES 12
+#define MAX_CORE_RESIDENCIES 8
+
 #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
 
 static xc_interface *xc_handle;
@@ -100,9 +103,9 @@ static void parse_cpuid_and_int(int argc
     }
 }
 
-static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
+static void print_cxstat(int cpuid, const struct xc_cx_stat_v2 *cxstat)
 {
-    int i;
+    unsigned int i;
 
     printf("cpu id               : %d\n", cpuid);
     printf("total C-states       : %d\n", cxstat->nr);
@@ -115,22 +118,20 @@ static void print_cxstat(int cpuid, stru
         printf("                       residency  [%20"PRIu64" ms]\n",
                cxstat->residencies[i]/1000000UL);
     }
-    printf("pc2                  : [%20"PRIu64" ms]\n"
-           "pc3                  : [%20"PRIu64" ms]\n"
-           "pc6                  : [%20"PRIu64" ms]\n"
-           "pc7                  : [%20"PRIu64" ms]\n",
-            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
-            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
-    printf("cc3                  : [%20"PRIu64" ms]\n"
-           "cc6                  : [%20"PRIu64" ms]\n"
-           "cc7                  : [%20"PRIu64" ms]\n",
-            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
-            cxstat->cc7/1000000UL);
+    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
+        if ( cxstat->pc[i] )
+           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->pc[i] / 1000000UL);
+    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
+        if ( cxstat->cc[i] )
+           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->cc[i] / 1000000UL);
     printf("\n");
 }
 
 /* show cpu idle information on CPU cpuid */
-static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid, struct xc_cx_stat *cxstat)
+static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid,
+                               struct xc_cx_stat_v2 *cxstat)
 {
     int ret = 0;
     int max_cx_num = 0;
@@ -145,24 +146,36 @@ static int get_cxstat_by_cpuid(xc_interf
     if ( !max_cx_num )
         return -ENODEV;
 
-    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->triggers )
-        return -ENOMEM;
-    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->residencies )
+    cxstat->triggers = malloc(max_cx_num * sizeof(*cxstat->triggers));
+    cxstat->residencies = malloc(max_cx_num * sizeof(*cxstat->residencies));
+    cxstat->pc = malloc(MAX_PKG_RESIDENCIES * sizeof(*cxstat->pc));
+    cxstat->cc = malloc(MAX_CORE_RESIDENCIES * sizeof(*cxstat->cc));
+    if ( !cxstat->triggers || !cxstat->residencies ||
+         !cxstat->pc || !cxstat->cc )
     {
+        free(cxstat->cc);
+        free(cxstat->pc);
+        free(cxstat->residencies);
         free(cxstat->triggers);
         return -ENOMEM;
     }
 
-    ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
+    cxstat->nr = max_cx_num;
+    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
+    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
+
+    ret = xc_pm_get_cx_stat(xc_handle, cpuid, cxstat);
     if( ret )
     {
         ret = -errno;
         free(cxstat->triggers);
         free(cxstat->residencies);
+        free(cxstat->pc);
+        free(cxstat->cc);
         cxstat->triggers = NULL;
         cxstat->residencies = NULL;
+        cxstat->pc = NULL;
+        cxstat->cc = NULL;
     }
 
     return ret;
@@ -183,7 +196,7 @@ static int show_max_cstate(xc_interface 
 static int show_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid)
 {
     int ret = 0;
-    struct xc_cx_stat cxstatinfo;
+    struct xc_cx_stat_v2 cxstatinfo;
 
     ret = get_cxstat_by_cpuid(xc_handle, cpuid, &cxstatinfo);
     if ( ret )
@@ -198,6 +211,8 @@ static int show_cxstat_by_cpuid(xc_inter
 
     free(cxstatinfo.triggers);
     free(cxstatinfo.residencies);
+    free(cxstatinfo.pc);
+    free(cxstatinfo.cc);
     return 0;
 }
 
@@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
 }
 
 static uint64_t usec_start, usec_end;
-static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
+static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
 static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
 static int *avgfreq;
 static uint64_t *sum, *sum_cx, *sum_px;
@@ -482,25 +497,26 @@ static void signal_int_handler(int signo
             /* print out CC? and PC? */
             for ( i = 0; i < socket_nr; i++ )
             {
+                unsigned int n;
                 uint64_t res;
+
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
                     if ( cpu_to_socket[j] == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
-                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
-                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
-                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
-                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
-                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
+                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
+                {
+                    if ( n >= cxstat_end[j].nr_pc )
+                        continue;
+                    res = cxstat_end[j].pc[n];
+                    if ( n < cxstat_start[j].nr_pc )
+                        res -= cxstat_start[j].pc[n];
+                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
+                           n + 1, res / 1000000UL,
+                           100UL * res / (double)sum_cx[j]);
+                }
                 for ( k = 0; k < core_nr; k++ )
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
@@ -510,15 +526,17 @@ static void signal_int_handler(int signo
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
-                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
-                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
-                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
-                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                           100UL * res / (double)sum_cx[j]);
+                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
+                    {
+                        if ( n >= cxstat_end[j].nr_cc )
+                            continue;
+                        res = cxstat_end[j].cc[n];
+                        if ( n < cxstat_start[j].nr_cc )
+                            res -= cxstat_start[j].cc[n];
+                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
+                               n + 1, res / 1000000UL,
+                               100UL * res / (double)sum_cx[j]);
+                    }
                 }
             }
         }
@@ -529,6 +547,8 @@ static void signal_int_handler(int signo
     {
         free(cxstat[i].triggers);
         free(cxstat[i].residencies);
+        free(cxstat[i].pc);
+        free(cxstat[i].cc);
         free(pxstat[i].trans_pt);
         free(pxstat[i].pt);
     }



[-- Attachment #2: xenpm-cx-new.patch --]
[-- Type: text/plain, Size: 8429 bytes --]

xenpm: use new Cx statistics interface

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -29,6 +29,9 @@
 #include <inttypes.h>
 #include <sys/time.h>
 
+#define MAX_PKG_RESIDENCIES 12
+#define MAX_CORE_RESIDENCIES 8
+
 #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
 
 static xc_interface *xc_handle;
@@ -100,9 +103,9 @@ static void parse_cpuid_and_int(int argc
     }
 }
 
-static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
+static void print_cxstat(int cpuid, const struct xc_cx_stat_v2 *cxstat)
 {
-    int i;
+    unsigned int i;
 
     printf("cpu id               : %d\n", cpuid);
     printf("total C-states       : %d\n", cxstat->nr);
@@ -115,22 +118,20 @@ static void print_cxstat(int cpuid, stru
         printf("                       residency  [%20"PRIu64" ms]\n",
                cxstat->residencies[i]/1000000UL);
     }
-    printf("pc2                  : [%20"PRIu64" ms]\n"
-           "pc3                  : [%20"PRIu64" ms]\n"
-           "pc6                  : [%20"PRIu64" ms]\n"
-           "pc7                  : [%20"PRIu64" ms]\n",
-            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
-            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
-    printf("cc3                  : [%20"PRIu64" ms]\n"
-           "cc6                  : [%20"PRIu64" ms]\n"
-           "cc7                  : [%20"PRIu64" ms]\n",
-            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
-            cxstat->cc7/1000000UL);
+    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
+        if ( cxstat->pc[i] )
+           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->pc[i] / 1000000UL);
+    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
+        if ( cxstat->cc[i] )
+           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->cc[i] / 1000000UL);
     printf("\n");
 }
 
 /* show cpu idle information on CPU cpuid */
-static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid, struct xc_cx_stat *cxstat)
+static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid,
+                               struct xc_cx_stat_v2 *cxstat)
 {
     int ret = 0;
     int max_cx_num = 0;
@@ -145,24 +146,36 @@ static int get_cxstat_by_cpuid(xc_interf
     if ( !max_cx_num )
         return -ENODEV;
 
-    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->triggers )
-        return -ENOMEM;
-    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->residencies )
+    cxstat->triggers = malloc(max_cx_num * sizeof(*cxstat->triggers));
+    cxstat->residencies = malloc(max_cx_num * sizeof(*cxstat->residencies));
+    cxstat->pc = malloc(MAX_PKG_RESIDENCIES * sizeof(*cxstat->pc));
+    cxstat->cc = malloc(MAX_CORE_RESIDENCIES * sizeof(*cxstat->cc));
+    if ( !cxstat->triggers || !cxstat->residencies ||
+         !cxstat->pc || !cxstat->cc )
     {
+        free(cxstat->cc);
+        free(cxstat->pc);
+        free(cxstat->residencies);
         free(cxstat->triggers);
         return -ENOMEM;
     }
 
-    ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
+    cxstat->nr = max_cx_num;
+    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
+    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
+
+    ret = xc_pm_get_cx_stat(xc_handle, cpuid, cxstat);
     if( ret )
     {
         ret = -errno;
         free(cxstat->triggers);
         free(cxstat->residencies);
+        free(cxstat->pc);
+        free(cxstat->cc);
         cxstat->triggers = NULL;
         cxstat->residencies = NULL;
+        cxstat->pc = NULL;
+        cxstat->cc = NULL;
     }
 
     return ret;
@@ -183,7 +196,7 @@ static int show_max_cstate(xc_interface 
 static int show_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid)
 {
     int ret = 0;
-    struct xc_cx_stat cxstatinfo;
+    struct xc_cx_stat_v2 cxstatinfo;
 
     ret = get_cxstat_by_cpuid(xc_handle, cpuid, &cxstatinfo);
     if ( ret )
@@ -198,6 +211,8 @@ static int show_cxstat_by_cpuid(xc_inter
 
     free(cxstatinfo.triggers);
     free(cxstatinfo.residencies);
+    free(cxstatinfo.pc);
+    free(cxstatinfo.cc);
     return 0;
 }
 
@@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
 }
 
 static uint64_t usec_start, usec_end;
-static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
+static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
 static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
 static int *avgfreq;
 static uint64_t *sum, *sum_cx, *sum_px;
@@ -482,25 +497,26 @@ static void signal_int_handler(int signo
             /* print out CC? and PC? */
             for ( i = 0; i < socket_nr; i++ )
             {
+                unsigned int n;
                 uint64_t res;
+
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
                     if ( cpu_to_socket[j] == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
-                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
-                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
-                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
-                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
-                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
+                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
+                {
+                    if ( n >= cxstat_end[j].nr_pc )
+                        continue;
+                    res = cxstat_end[j].pc[n];
+                    if ( n < cxstat_start[j].nr_pc )
+                        res -= cxstat_start[j].pc[n];
+                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
+                           n + 1, res / 1000000UL,
+                           100UL * res / (double)sum_cx[j]);
+                }
                 for ( k = 0; k < core_nr; k++ )
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
@@ -510,15 +526,17 @@ static void signal_int_handler(int signo
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
-                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
-                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
-                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
-                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                           100UL * res / (double)sum_cx[j]);
+                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
+                    {
+                        if ( n >= cxstat_end[j].nr_cc )
+                            continue;
+                        res = cxstat_end[j].cc[n];
+                        if ( n < cxstat_start[j].nr_cc )
+                            res -= cxstat_start[j].cc[n];
+                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
+                               n + 1, res / 1000000UL,
+                               100UL * res / (double)sum_cx[j]);
+                    }
                 }
             }
         }
@@ -529,6 +547,8 @@ static void signal_int_handler(int signo
     {
         free(cxstat[i].triggers);
         free(cxstat[i].residencies);
+        free(cxstat[i].pc);
+        free(cxstat[i].cc);
         free(pxstat[i].trans_pt);
         free(pxstat[i].pt);
     }

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
@ 2014-03-05 10:42   ` Jan Beulich
  2014-03-18  2:44     ` Tian, Kevin
  2014-03-05 15:07   ` Boris Ostrovsky
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 10:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

>>> On 05.03.14 at 11:37, "Jan Beulich" <JBeulich@suse.com> wrote:
> With the number of these growing it becomes increasingly desirable to
> not repeatedly alter the sysctl interface to accommodate them. Replace
> the explicit listing of numbered states by arrays, unused fields of
> which will remain untouched by the hypercall.

Just added this to the description:

"The adjusted sysctl interface at once fixes an unrelated shortcoming
 of the original one: The "nr" field, specifying the size of the
 "triggers" and "residencies" arrays, has to be an input (along with
 being an output), which the previous implementation didn't obey to."

Sorry for forgetting the first time through.

Jan

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
>  
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>  {
> -    DECLARE_SYSCTL;
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, 
> XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, 
> XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +    uint64_t pc[7], cc[7];
> +    struct xc_cx_stat_v2 cxpt2 = {
> +        .triggers = cxpt->triggers,
> +        .residencies = cxpt->residencies,
> +        .nr_pc = sizeof(pc) / sizeof(*pc),
> +        .nr_cc = sizeof(cc) / sizeof(*cc),
> +        .pc = pc,
> +        .cc = cc
> +    };
>      int max_cx, ret;
>  
>      if( !cxpt->triggers || !cxpt->residencies )
>          return -EINVAL;
>  
>      if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
> -        goto unlock_0;
> +        return ret;
>  
> -    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
> -    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
> +    cxpt2.nr = max_cx;
> +    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
> +
> +    cxpt->nr = cxpt2.nr;
> +    cxpt->last = cxpt2.last;
> +    cxpt->idle_time = cxpt2.idle_time;
> +    cxpt->pc2 = pc[1];
> +    cxpt->pc3 = pc[2];
> +    cxpt->pc6 = pc[5];
> +    cxpt->pc7 = pc[6];
> +    cxpt->cc3 = cc[2];
> +    cxpt->cc6 = cc[5];
> +    cxpt->cc7 = cc[6];
> +
> +    return ret;
> +}
> +
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 
> *cxpt)
> +{
> +    DECLARE_SYSCTL;
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> +                                   cxpt->nr * sizeof(*cxpt->triggers),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
> +                                   cxpt->nr * sizeof(*cxpt->residencies),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
> +                                   cxpt->nr_pc * sizeof(*cxpt->pc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
> +                                   cxpt->nr_cc * sizeof(*cxpt->cc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    int ret = -1;
>  
> -    ret = -1;
>      if ( xc_hypercall_bounce_pre(xch, triggers) )
>          goto unlock_0;
>      if ( xc_hypercall_bounce_pre(xch, residencies) )
>          goto unlock_1;
> +    if ( xc_hypercall_bounce_pre(xch, pc) )
> +        goto unlock_2;
> +    if ( xc_hypercall_bounce_pre(xch, cc) )
> +        goto unlock_3;
>  
>      sysctl.cmd = XEN_SYSCTL_get_pmstat;
>      sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
>      sysctl.u.get_pmstat.cpuid = cpuid;
> +    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
> +    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
> +    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, 
> residencies);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
>  
>      if ( (ret = xc_sysctl(xch, &sysctl)) )
> -        goto unlock_2;
> +        goto unlock_4;
>  
>      cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
>      cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
>      cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
> -    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
> -    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
> -    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
> -    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
> -    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
> -    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
> -    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
> +    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
> +    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
>  
> +unlock_4:
> +    xc_hypercall_bounce_post(xch, cc);
> +unlock_3:
> +    xc_hypercall_bounce_post(xch, pc);
>  unlock_2:
>      xc_hypercall_bounce_post(xch, residencies);
>  unlock_1:
> --- 2014-02-13.orig/tools/libxc/xenctrl.h	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xenctrl.h	2014-03-04 17:50:49.000000000 +0100
> @@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch, 
>  int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat 
> *pxpt);
>  int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
>  
> -struct xc_cx_stat {
> +struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
>      uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
>      uint32_t last;         /* last Cx state */
>      uint64_t idle_time;    /* idle time from boot */
> @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
>  };
>  typedef struct xc_cx_stat xc_cx_stat_t;
>  
> +struct xc_cx_stat_v2 {
> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 
> */
> +    uint32_t last;         /* last Cx state */
> +    uint64_t idle_time;    /* idle time from boot */
> +    uint64_t *triggers;    /* Cx trigger counts */
> +    uint64_t *residencies; /* Cx residencies */
> +    uint32_t nr_pc;        /* entry nr in pc[] */
> +    uint32_t nr_cc;        /* entry nr in cc[] */
> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
> +};
> +typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
> +
>  int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat 
> *cxpt);
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 
> *);
>  int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
>  
>  int xc_cpu_online(xc_interface *xch, int cpu);
> --- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
> @@ -62,13 +62,17 @@
>  
>  #define GET_HW_RES_IN_NS(msr, val) \
>      do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
> -#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
> +#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
>  #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
>  #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
>  #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
> +#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells 
> only */
> +#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells 
> only */
> +#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells 
> only */
> +#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only 
> */
>  #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
>  #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
> -#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
> +#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
>  
>  static void lapic_timer_nop(void) { }
>  void (*__read_mostly lapic_timer_off)(void);
> @@ -111,8 +115,13 @@ struct hw_residencies
>  {
>      uint64_t pc2;
>      uint64_t pc3;
> +    uint64_t pc4;
>      uint64_t pc6;
>      uint64_t pc7;
> +    uint64_t pc8;
> +    uint64_t pc9;
> +    uint64_t pc10;
> +    uint64_t cc1;
>      uint64_t cc3;
>      uint64_t cc6;
>      uint64_t cc7;
> @@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
>  
>      switch ( c->x86_model )
>      {
> +    /* 4th generation Intel Core (Haswell) */
> +    case 0x45:
> +        GET_PC8_RES(hw_res->pc8);
> +        GET_PC9_RES(hw_res->pc9);
> +        GET_PC10_RES(hw_res->pc10);
> +        /* fall through */
>      /* Sandy bridge */
>      case 0x2A:
>      case 0x2D:
> @@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
>      /* Haswell */
>      case 0x3C:
>      case 0x3F:
> -    case 0x45:
>      case 0x46:
>      /* future */
>      case 0x3D:
> @@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
>          GET_CC3_RES(hw_res->cc3);
>          GET_CC6_RES(hw_res->cc6);
>          break;
> +    /* various Atoms */
> +    case 0x27:
> +        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
> +        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        break;
> +    /* Silvermont */
> +    case 0x37:
> +    case 0x4A:
> +    case 0x4D:
> +    case 0x5A:
> +    case 0x5D:
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        GET_CC1_RES(hw_res->cc1);
> +        GET_CC6_RES(hw_res->cc6);
> +        break;
>      }
>  }
>  
> @@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
>  
>      get_hw_residencies(cpu, &hw_res);
>  
> -    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
> -           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
> -    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
> -           hw_res.cc3, hw_res.cc6,hw_res.cc7);
> +    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] 
> PC7[%"PRIu64"]\n",
> +           hw_res.pc2,
> +           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
> +           hw_res.pc6, hw_res.pc7);
> +    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
> +        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
> +               hw_res.pc8, hw_res.pc9, hw_res.pc10);
> +    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
> +           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
> +           hw_res.cc6, hw_res.cc7);
>  }
>  
>  static char* acpi_cstate_method_name[] =
> @@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>      struct acpi_processor_power *power = processor_powers[cpuid];
>      uint64_t idle_usage = 0, idle_res = 0;
>      uint64_t usage[ACPI_PROCESSOR_MAX_POWER], 
> res[ACPI_PROCESSOR_MAX_POWER];
> -    int i;
> -    struct hw_residencies hw_res;
> +    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
>  
>      if ( power == NULL )
>      {
>          stat->last = 0;
>          stat->nr = 0;
>          stat->idle_time = 0;
> +        stat->nr_pc = 0;
> +        stat->nr_cc = 0;
>          return 0;
>      }
>  
>      stat->last = power->last_state ? power->last_state->idx : 0;
>      stat->idle_time = get_cpu_idle_time(cpuid);
> +    nr = min(stat->nr, power->count);
>  
>      /* mimic the stat when detail info hasn't been registered by dom0 */
>      if ( pm_idle_save == NULL )
> @@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>  
>          usage[1] = idle_usage = 1;
>          res[1] = idle_res = stat->idle_time;
> -
> -        memset(&hw_res, 0, sizeof(hw_res));
>      }
>      else
>      {
> +        struct hw_residencies hw_res;
> +
>          stat->nr = power->count;
>  
> -        for ( i = 1; i < power->count; i++ )
> +        for ( i = 1; i < nr; i++ )
>          {
>              spin_lock_irq(&power->stat_lock);
>              usage[i] = power->states[i].usage;
> @@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>          }
>  
>          get_hw_residencies(cpuid, &hw_res);
> +
> +#define PUT_xC(what, n) do { \
> +        if ( stat->nr_##what >= n && \
> +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
> +            return -EFAULT; \
> +        if ( hw_res.what##n ) \
> +            nr_##what = n; \
> +    } while ( 0 )
> +#define PUT_PC(n) PUT_xC(pc, n)
> +        PUT_PC(2);
> +        PUT_PC(3);
> +        PUT_PC(4);
> +        PUT_PC(6);
> +        PUT_PC(7);
> +        PUT_PC(8);
> +        PUT_PC(9);
> +        PUT_PC(10);
> +#undef PUT_PC
> +#define PUT_CC(n) PUT_xC(cc, n)
> +        PUT_CC(1);
> +        PUT_CC(3);
> +        PUT_CC(6);
> +        PUT_CC(7);
> +#undef PUT_CC
> +#undef PUT_xC
>      }
>  
>      usage[0] = idle_usage;
>      res[0] = NOW() - idle_res;
>  
> -    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
> -         copy_to_guest(stat->residencies, res, stat->nr) )
> +    if ( copy_to_guest(stat->triggers, usage, nr) ||
> +         copy_to_guest(stat->residencies, res, nr) )
>          return -EFAULT;
>  
> -    stat->pc2 = hw_res.pc2;
> -    stat->pc3 = hw_res.pc3;
> -    stat->pc6 = hw_res.pc6;
> -    stat->pc7 = hw_res.pc7;
> -    stat->cc3 = hw_res.cc3;
> -    stat->cc6 = hw_res.cc6;
> -    stat->cc7 = hw_res.cc7;
> +    stat->nr_pc = nr_pc;
> +    stat->nr_cc = nr_cc;
>  
>      return 0;
>  }
> --- 2014-02-13.orig/xen/include/public/sysctl.h	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
> @@ -34,7 +34,7 @@
>  #include "xen.h"
>  #include "domctl.h"
>  
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
>  
>  /*
>   * Read console content from Xen buffer ring.
> @@ -226,13 +226,10 @@ struct pm_cx_stat {
>      uint64_aligned_t idle_time;                 /* idle time from boot */
>      XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
>      XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> -    uint64_aligned_t pc2;
> -    uint64_aligned_t pc3;
> -    uint64_aligned_t pc6;
> -    uint64_aligned_t pc7;
> -    uint64_aligned_t cc3;
> -    uint64_aligned_t cc6;
> -    uint64_aligned_t cc7;
> +    uint32_t nr_pc;                          /* entry nr in pc[] */
> +    uint32_t nr_cc;                          /* entry nr in cc[] */
> +    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
> +    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
>  };
>  
>  struct xen_sysctl_get_pmstat {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
  2014-03-05 10:42   ` Jan Beulich
@ 2014-03-05 15:07   ` Boris Ostrovsky
  2014-03-05 15:15     ` Jan Beulich
  2014-03-13 14:11   ` Ian Campbell
  2014-03-13 14:28   ` Keir Fraser
  3 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2014-03-05 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 14899 bytes --]

On 03/05/2014 05:37 AM, Jan Beulich wrote:
> With the number of these growing it becomes increasingly desirable to
> not repeatedly alter the sysctl interface to accommodate them. Replace
> the explicit listing of numbered states by arrays, unused fields of
> which will remain untouched by the hypercall.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch,
>   
>   int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>   {
> -    DECLARE_SYSCTL;
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +    uint64_t pc[7], cc[7];

Do you need pc[10]? There seem to exist pc8-10 states (at least there 
are references to them below for Haswell).

> +    struct xc_cx_stat_v2 cxpt2 = {
> +        .triggers = cxpt->triggers,
> +        .residencies = cxpt->residencies,
> +        .nr_pc = sizeof(pc) / sizeof(*pc),
> +        .nr_cc = sizeof(cc) / sizeof(*cc),
> +        .pc = pc,
> +        .cc = cc
> +    };
>       int max_cx, ret;
>   
>       if( !cxpt->triggers || !cxpt->residencies )
>           return -EINVAL;
>   
>       if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
> -        goto unlock_0;
> +        return ret;
>   
> -    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
> -    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
> +    cxpt2.nr = max_cx;
> +    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);

Why are you not returning on error here?

> +
> +    cxpt->nr = cxpt2.nr;
> +    cxpt->last = cxpt2.last;
> +    cxpt->idle_time = cxpt2.idle_time;
> +    cxpt->pc2 = pc[1];
> +    cxpt->pc3 = pc[2];
> +    cxpt->pc6 = pc[5];
> +    cxpt->pc7 = pc[6];
> +    cxpt->cc3 = cc[2];
> +    cxpt->cc6 = cc[5];
> +    cxpt->cc7 = cc[6];
> +
> +    return ret;
> +}
> +
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *cxpt)
> +{
> +    DECLARE_SYSCTL;
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> +                                   cxpt->nr * sizeof(*cxpt->triggers),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
> +                                   cxpt->nr * sizeof(*cxpt->residencies),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
> +                                   cxpt->nr_pc * sizeof(*cxpt->pc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
> +                                   cxpt->nr_cc * sizeof(*cxpt->cc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    int ret = -1;
>   
> -    ret = -1;
>       if ( xc_hypercall_bounce_pre(xch, triggers) )
>           goto unlock_0;
>       if ( xc_hypercall_bounce_pre(xch, residencies) )
>           goto unlock_1;
> +    if ( xc_hypercall_bounce_pre(xch, pc) )
> +        goto unlock_2;
> +    if ( xc_hypercall_bounce_pre(xch, cc) )
> +        goto unlock_3;
>   
>       sysctl.cmd = XEN_SYSCTL_get_pmstat;
>       sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
>       sysctl.u.get_pmstat.cpuid = cpuid;
> +    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
> +    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
> +    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
>       set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
>       set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
>   
>       if ( (ret = xc_sysctl(xch, &sysctl)) )
> -        goto unlock_2;
> +        goto unlock_4;
>   
>       cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
>       cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
>       cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
> -    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
> -    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
> -    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
> -    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
> -    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
> -    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
> -    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
> +    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
> +    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
>   
> +unlock_4:
> +    xc_hypercall_bounce_post(xch, cc);
> +unlock_3:
> +    xc_hypercall_bounce_post(xch, pc);
>   unlock_2:
>       xc_hypercall_bounce_post(xch, residencies);
>   unlock_1:
> --- 2014-02-13.orig/tools/libxc/xenctrl.h	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xenctrl.h	2014-03-04 17:50:49.000000000 +0100
> @@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch,
>   int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat *pxpt);
>   int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
>   
> -struct xc_cx_stat {
> +struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
>       uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
>       uint32_t last;         /* last Cx state */
>       uint64_t idle_time;    /* idle time from boot */
> @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
>   };
>   typedef struct xc_cx_stat xc_cx_stat_t;
>   
> +struct xc_cx_stat_v2 {
> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
> +    uint32_t last;         /* last Cx state */
> +    uint64_t idle_time;    /* idle time from boot */
> +    uint64_t *triggers;    /* Cx trigger counts */
> +    uint64_t *residencies; /* Cx residencies */
> +    uint32_t nr_pc;        /* entry nr in pc[] */
> +    uint32_t nr_cc;        /* entry nr in cc[] */

Are these entry number or number of entries (or largest entry number) in 
appropriate array?

> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
> +};
> +typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
> +
>   int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
>   int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt);
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *);

You forgot last parameter's name.


-boris

>   int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
>   
>   int xc_cpu_online(xc_interface *xch, int cpu);
> --- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
> @@ -62,13 +62,17 @@
>   
>   #define GET_HW_RES_IN_NS(msr, val) \
>       do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
> -#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
> +#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
>   #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
>   #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
>   #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
> +#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
> +#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
> +#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
> +#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
>   #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
>   #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
> -#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
> +#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
>   
>   static void lapic_timer_nop(void) { }
>   void (*__read_mostly lapic_timer_off)(void);
> @@ -111,8 +115,13 @@ struct hw_residencies
>   {
>       uint64_t pc2;
>       uint64_t pc3;
> +    uint64_t pc4;
>       uint64_t pc6;
>       uint64_t pc7;
> +    uint64_t pc8;
> +    uint64_t pc9;
> +    uint64_t pc10;
> +    uint64_t cc1;
>       uint64_t cc3;
>       uint64_t cc6;
>       uint64_t cc7;
> @@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
>   
>       switch ( c->x86_model )
>       {
> +    /* 4th generation Intel Core (Haswell) */
> +    case 0x45:
> +        GET_PC8_RES(hw_res->pc8);
> +        GET_PC9_RES(hw_res->pc9);
> +        GET_PC10_RES(hw_res->pc10);
> +        /* fall through */
>       /* Sandy bridge */
>       case 0x2A:
>       case 0x2D:
> @@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
>       /* Haswell */
>       case 0x3C:
>       case 0x3F:
> -    case 0x45:
>       case 0x46:
>       /* future */
>       case 0x3D:
> @@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
>           GET_CC3_RES(hw_res->cc3);
>           GET_CC6_RES(hw_res->cc6);
>           break;
> +    /* various Atoms */
> +    case 0x27:
> +        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
> +        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        break;
> +    /* Silvermont */
> +    case 0x37:
> +    case 0x4A:
> +    case 0x4D:
> +    case 0x5A:
> +    case 0x5D:
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        GET_CC1_RES(hw_res->cc1);
> +        GET_CC6_RES(hw_res->cc6);
> +        break;
>       }
>   }
>   
> @@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
>   
>       get_hw_residencies(cpu, &hw_res);
>   
> -    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
> -           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
> -    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
> -           hw_res.cc3, hw_res.cc6,hw_res.cc7);
> +    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
> +           hw_res.pc2,
> +           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
> +           hw_res.pc6, hw_res.pc7);
> +    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
> +        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
> +               hw_res.pc8, hw_res.pc9, hw_res.pc10);
> +    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
> +           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
> +           hw_res.cc6, hw_res.cc7);
>   }
>   
>   static char* acpi_cstate_method_name[] =
> @@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>       struct acpi_processor_power *power = processor_powers[cpuid];
>       uint64_t idle_usage = 0, idle_res = 0;
>       uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
> -    int i;
> -    struct hw_residencies hw_res;
> +    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
>   
>       if ( power == NULL )
>       {
>           stat->last = 0;
>           stat->nr = 0;
>           stat->idle_time = 0;
> +        stat->nr_pc = 0;
> +        stat->nr_cc = 0;
>           return 0;
>       }
>   
>       stat->last = power->last_state ? power->last_state->idx : 0;
>       stat->idle_time = get_cpu_idle_time(cpuid);
> +    nr = min(stat->nr, power->count);
>   
>       /* mimic the stat when detail info hasn't been registered by dom0 */
>       if ( pm_idle_save == NULL )
> @@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>   
>           usage[1] = idle_usage = 1;
>           res[1] = idle_res = stat->idle_time;
> -
> -        memset(&hw_res, 0, sizeof(hw_res));
>       }
>       else
>       {
> +        struct hw_residencies hw_res;
> +
>           stat->nr = power->count;
>   
> -        for ( i = 1; i < power->count; i++ )
> +        for ( i = 1; i < nr; i++ )
>           {
>               spin_lock_irq(&power->stat_lock);
>               usage[i] = power->states[i].usage;
> @@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>           }
>   
>           get_hw_residencies(cpuid, &hw_res);
> +
> +#define PUT_xC(what, n) do { \
> +        if ( stat->nr_##what >= n && \
> +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
> +            return -EFAULT; \
> +        if ( hw_res.what##n ) \
> +            nr_##what = n; \
> +    } while ( 0 )
> +#define PUT_PC(n) PUT_xC(pc, n)
> +        PUT_PC(2);
> +        PUT_PC(3);
> +        PUT_PC(4);
> +        PUT_PC(6);
> +        PUT_PC(7);
> +        PUT_PC(8);
> +        PUT_PC(9);
> +        PUT_PC(10);
> +#undef PUT_PC
> +#define PUT_CC(n) PUT_xC(cc, n)
> +        PUT_CC(1);
> +        PUT_CC(3);
> +        PUT_CC(6);
> +        PUT_CC(7);
> +#undef PUT_CC
> +#undef PUT_xC
>       }
>   
>       usage[0] = idle_usage;
>       res[0] = NOW() - idle_res;
>   
> -    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
> -         copy_to_guest(stat->residencies, res, stat->nr) )
> +    if ( copy_to_guest(stat->triggers, usage, nr) ||
> +         copy_to_guest(stat->residencies, res, nr) )
>           return -EFAULT;
>   
> -    stat->pc2 = hw_res.pc2;
> -    stat->pc3 = hw_res.pc3;
> -    stat->pc6 = hw_res.pc6;
> -    stat->pc7 = hw_res.pc7;
> -    stat->cc3 = hw_res.cc3;
> -    stat->cc6 = hw_res.cc6;
> -    stat->cc7 = hw_res.cc7;
> +    stat->nr_pc = nr_pc;
> +    stat->nr_cc = nr_cc;
>   
>       return 0;
>   }
> --- 2014-02-13.orig/xen/include/public/sysctl.h	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
> @@ -34,7 +34,7 @@
>   #include "xen.h"
>   #include "domctl.h"
>   
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
>   
>   /*
>    * Read console content from Xen buffer ring.
> @@ -226,13 +226,10 @@ struct pm_cx_stat {
>       uint64_aligned_t idle_time;                 /* idle time from boot */
>       XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
>       XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> -    uint64_aligned_t pc2;
> -    uint64_aligned_t pc3;
> -    uint64_aligned_t pc6;
> -    uint64_aligned_t pc7;
> -    uint64_aligned_t cc3;
> -    uint64_aligned_t cc6;
> -    uint64_aligned_t cc7;
> +    uint32_t nr_pc;                          /* entry nr in pc[] */
> +    uint32_t nr_cc;                          /* entry nr in cc[] */
> +    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
> +    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
>   };
>   
>   struct xen_sysctl_get_pmstat {
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 15695 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 15:07   ` Boris Ostrovsky
@ 2014-03-05 15:15     ` Jan Beulich
  2014-03-05 15:30       ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 15:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

>>> On 05.03.14 at 16:07, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 03/05/2014 05:37 AM, Jan Beulich wrote:
>> With the number of these growing it becomes increasingly desirable to
>> not repeatedly alter the sysctl interface to accommodate them. Replace
>> the explicit listing of numbered states by arrays, unused fields of
>> which will remain untouched by the hypercall.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
>> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
>> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch,
>>   
>>   int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>>   {
>> -    DECLARE_SYSCTL;
>> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
>> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
>> +    uint64_t pc[7], cc[7];
> 
> Do you need pc[10]? There seem to exist pc8-10 states (at least there 
> are references to them below for Haswell).

Did you not realize that this is the compatibility wrapper around the
new function? There's no place for me to store pc8 and higher, so
why would I waste space to retrieve them from the hypervisor?

>> +    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
> 
> Why are you not returning on error here?

Because I'm doing so ...

>> +
>> +    cxpt->nr = cxpt2.nr;
>> +    cxpt->last = cxpt2.last;
>> +    cxpt->idle_time = cxpt2.idle_time;
>> +    cxpt->pc2 = pc[1];
>> +    cxpt->pc3 = pc[2];
>> +    cxpt->pc6 = pc[5];
>> +    cxpt->pc7 = pc[6];
>> +    cxpt->cc3 = cc[2];
>> +    cxpt->cc6 = cc[5];
>> +    cxpt->cc7 = cc[6];
>> +
>> +    return ret;
>> +}

... here. Why make the code more complicated (with an additional
return path) without need?

>> @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
>>   };
>>   typedef struct xc_cx_stat xc_cx_stat_t;
>>   
>> +struct xc_cx_stat_v2 {
>> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
>> +    uint32_t last;         /* last Cx state */
>> +    uint64_t idle_time;    /* idle time from boot */
>> +    uint64_t *triggers;    /* Cx trigger counts */
>> +    uint64_t *residencies; /* Cx residencies */
>> +    uint32_t nr_pc;        /* entry nr in pc[] */
>> +    uint32_t nr_cc;        /* entry nr in cc[] */
> 
> Are these entry number or number of entries (or largest entry number) in 
> appropriate array?

Just like above (for "nr") - the number of entries in the arrays.

>> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
>> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */

The slightly unusual thing is the indexing into these array: entry 0
has data for C1, entry 1 for C2, etc.

>> +};
>> +typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
>> +
>>   int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
>>   int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt);
>> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *);
> 
> You forgot last parameter's name.

No - I stripped it in order to save me from having to wrap the line.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 15:15     ` Jan Beulich
@ 2014-03-05 15:30       ` Boris Ostrovsky
  0 siblings, 0 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2014-03-05 15:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

On 03/05/2014 10:15 AM, Jan Beulich wrote:
>>>> On 05.03.14 at 16:07, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> On 03/05/2014 05:37 AM, Jan Beulich wrote:
>>> With the number of these growing it becomes increasingly desirable to
>>> not repeatedly alter the sysctl interface to accommodate them. Replace
>>> the explicit listing of numbered states by arrays, unused fields of
>>> which will remain untouched by the hypercall.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
>>> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
>>> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch,
>>>    
>>>    int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>>>    {
>>> -    DECLARE_SYSCTL;
>>> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
>>> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
>>> +    uint64_t pc[7], cc[7];
>> Do you need pc[10]? There seem to exist pc8-10 states (at least there
>> are references to them below for Haswell).
> Did you not realize that this is the compatibility wrapper around the
> new function? There's no place for me to store pc8 and higher, so
> why would I waste space to retrieve them from the hypervisor?

No, I didn't realize this (but now I see it).

>
>>> @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
>>>    };
>>>    typedef struct xc_cx_stat xc_cx_stat_t;
>>>    
>>> +struct xc_cx_stat_v2 {
>>> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
>>> +    uint32_t last;         /* last Cx state */
>>> +    uint64_t idle_time;    /* idle time from boot */
>>> +    uint64_t *triggers;    /* Cx trigger counts */
>>> +    uint64_t *residencies; /* Cx residencies */
>>> +    uint32_t nr_pc;        /* entry nr in pc[] */
>>> +    uint32_t nr_cc;        /* entry nr in cc[] */
>> Are these entry number or number of entries (or largest entry number) in
>> appropriate array?
> Just like above (for "nr") - the number of entries in the arrays.
>
>>> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
>>> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
> The slightly unusual thing is the indexing into these array: entry 0
> has data for C1, entry 1 for C2, etc.

This I understand. I just think that "entry nr" (especially the the fact 
that the first word is in singular) is confusing. Including the comment 
for the first member.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
@ 2014-03-05 15:47   ` Boris Ostrovsky
  2014-03-05 15:53     ` Jan Beulich
  2014-03-13 14:12   ` Ian Campbell
  2014-03-18  2:45   ` Tian, Kevin
  2 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2014-03-05 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

On 03/05/2014 05:37 AM, Jan Beulich wrote:
> @@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
>   }
>   
>   static uint64_t usec_start, usec_end;
> -static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
> +static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
>   static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
>   static int *avgfreq;
>   static uint64_t *sum, *sum_cx, *sum_px;
> @@ -482,25 +497,26 @@ static void signal_int_handler(int signo
>               /* print out CC? and PC? */
>               for ( i = 0; i < socket_nr; i++ )
>               {
> +                unsigned int n;
>                   uint64_t res;
> +
>                   for ( j = 0; j <= info.max_cpu_index; j++ )
>                   {
>                       if ( cpu_to_socket[j] == socket_ids[i] )
>                           break;
>                   }
>                   printf("\nSocket %d\n", socket_ids[i]);
> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
> +                {
> +                    if ( n >= cxstat_end[j].nr_pc )
> +                        continue;
> +                    res = cxstat_end[j].pc[n];
> +                    if ( n < cxstat_start[j].nr_pc )
> +                        res -= cxstat_start[j].pc[n];

Is it possible to have  cxstat_end[j].nr_pc != cxstat_start[j].nr_pc ?

-boris

> +                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                           n + 1, res / 1000000UL,
> +                           100UL * res / (double)sum_cx[j]);
> +                }
>                   for ( k = 0; k < core_nr; k++ )
>                   {
>                       for ( j = 0; j <= info.max_cpu_index; j++ )
> @@ -510,15 +526,17 @@ static void signal_int_handler(int signo
>                               break;
>                       }
>                       printf("\t Core %d CPU %d\n", core_ids[k], j);
> -                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
> -                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
> -                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
> -                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> +                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
> +                    {
> +                        if ( n >= cxstat_end[j].nr_cc )
> +                            continue;
> +                        res = cxstat_end[j].cc[n];
> +                        if ( n < cxstat_start[j].nr_cc )
> +                            res -= cxstat_start[j].cc[n];
> +                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                               n + 1, res / 1000000UL,
> +                               100UL * res / (double)sum_cx[j]);
> +                    }
>                   }
>               }
>           }
> @@ -529,6 +547,8 @@ static void signal_int_handler(int signo
>       {
>           free(cxstat[i].triggers);
>           free(cxstat[i].residencies);
> +        free(cxstat[i].pc);
> +        free(cxstat[i].cc);
>           free(pxstat[i].trans_pt);
>           free(pxstat[i].pt);
>       }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 15:47   ` Boris Ostrovsky
@ 2014-03-05 15:53     ` Jan Beulich
  2014-03-05 17:05       ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-05 15:53 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

>>> On 05.03.14 at 16:47, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 03/05/2014 05:37 AM, Jan Beulich wrote:
>> @@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
>>   }
>>   
>>   static uint64_t usec_start, usec_end;
>> -static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
>> +static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
>>   static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
>>   static int *avgfreq;
>>   static uint64_t *sum, *sum_cx, *sum_px;
>> @@ -482,25 +497,26 @@ static void signal_int_handler(int signo
>>               /* print out CC? and PC? */
>>               for ( i = 0; i < socket_nr; i++ )
>>               {
>> +                unsigned int n;
>>                   uint64_t res;
>> +
>>                   for ( j = 0; j <= info.max_cpu_index; j++ )
>>                   {
>>                       if ( cpu_to_socket[j] == socket_ids[i] )
>>                           break;
>>                   }
>>                   printf("\nSocket %d\n", socket_ids[i]);
>> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
>> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>> -                       100UL * res / (double)sum_cx[j]);
>> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
>> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>> -                       100UL * res / (double)sum_cx[j]);
>> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
>> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>> -                       100UL * res / (double)sum_cx[j]);
>> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
>> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>> -                       100UL * res / (double)sum_cx[j]);
>> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
>> +                {
>> +                    if ( n >= cxstat_end[j].nr_pc )
>> +                        continue;
>> +                    res = cxstat_end[j].pc[n];
>> +                    if ( n < cxstat_start[j].nr_pc )
>> +                        res -= cxstat_start[j].pc[n];
> 
> Is it possible to have  cxstat_end[j].nr_pc != cxstat_start[j].nr_pc ?

Yes - see the previous patch: It bumps the count only if the
respective hw_res field was non-zero.

But even if the current implementation didn't allow for this, I'd
still consider it good practice to cope with the possibility.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 15:53     ` Jan Beulich
@ 2014-03-05 17:05       ` Boris Ostrovsky
  2014-03-06  9:37         ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2014-03-05 17:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

On 03/05/2014 10:53 AM, Jan Beulich wrote:
>>>> On 05.03.14 at 16:47, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> On 03/05/2014 05:37 AM, Jan Beulich wrote:
>>> @@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
>>>    }
>>>    
>>>    static uint64_t usec_start, usec_end;
>>> -static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
>>> +static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
>>>    static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
>>>    static int *avgfreq;
>>>    static uint64_t *sum, *sum_cx, *sum_px;
>>> @@ -482,25 +497,26 @@ static void signal_int_handler(int signo
>>>                /* print out CC? and PC? */
>>>                for ( i = 0; i < socket_nr; i++ )
>>>                {
>>> +                unsigned int n;
>>>                    uint64_t res;
>>> +
>>>                    for ( j = 0; j <= info.max_cpu_index; j++ )
>>>                    {
>>>                        if ( cpu_to_socket[j] == socket_ids[i] )
>>>                            break;
>>>                    }
>>>                    printf("\nSocket %d\n", socket_ids[i]);
>>> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
>>> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>>> -                       100UL * res / (double)sum_cx[j]);
>>> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
>>> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>>> -                       100UL * res / (double)sum_cx[j]);
>>> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
>>> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>>> -                       100UL * res / (double)sum_cx[j]);
>>> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
>>> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
>>> -                       100UL * res / (double)sum_cx[j]);
>>> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
>>> +                {
>>> +                    if ( n >= cxstat_end[j].nr_pc )
>>> +                        continue;
>>> +                    res = cxstat_end[j].pc[n];
>>> +                    if ( n < cxstat_start[j].nr_pc )
>>> +                        res -= cxstat_start[j].pc[n];
>> Is it possible to have  cxstat_end[j].nr_pc != cxstat_start[j].nr_pc ?
> Yes - see the previous patch: It bumps the count only if the
> respective hw_res field was non-zero.

You mean this?

+
+#define PUT_xC(what, n) do { \
+        if ( stat->nr_##what >= n && \
+             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
+            return -EFAULT; \
+        if ( hw_res.what##n ) \
+            nr_##what = n; \
+    } while ( 0 )
+#define PUT_PC(n) PUT_xC(pc, n)

This reminds me of another question I had about this patch: this 
fragment appears to assume that you call it in order. In other words, 
will it work as intended if your call sequence is

PUT_PC(10)
..
PUT_PC(1)

-boris


>
> But even if the current implementation didn't allow for this, I'd
> still consider it good practice to cope with the possibility.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 17:05       ` Boris Ostrovsky
@ 2014-03-06  9:37         ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-06  9:37 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Donald D Dugger,
	Jun Nakajima, xen-devel

>>> On 05.03.14 at 18:05, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 03/05/2014 10:53 AM, Jan Beulich wrote:
>>>>> On 05.03.14 at 16:47, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> On 03/05/2014 05:37 AM, Jan Beulich wrote:
>>>> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
>>>> +                {
>>>> +                    if ( n >= cxstat_end[j].nr_pc )
>>>> +                        continue;
>>>> +                    res = cxstat_end[j].pc[n];
>>>> +                    if ( n < cxstat_start[j].nr_pc )
>>>> +                        res -= cxstat_start[j].pc[n];
>>> Is it possible to have  cxstat_end[j].nr_pc != cxstat_start[j].nr_pc ?
>> Yes - see the previous patch: It bumps the count only if the
>> respective hw_res field was non-zero.
> 
> You mean this?

Yes.

> +
> +#define PUT_xC(what, n) do { \
> +        if ( stat->nr_##what >= n && \
> +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
> +            return -EFAULT; \
> +        if ( hw_res.what##n ) \
> +            nr_##what = n; \
> +    } while ( 0 )
> +#define PUT_PC(n) PUT_xC(pc, n)
> 
> This reminds me of another question I had about this patch: this 
> fragment appears to assume that you call it in order.

Right. A pretty trivial requirement on the use of these scope
restricted macros.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Ping: [PATCH 0/3] x86: support further Intel CPU families
  2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
                   ` (2 preceding siblings ...)
  2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
@ 2014-03-12  9:38 ` Jan Beulich
  2014-03-12 10:18   ` Ian Campbell
  2014-03-17 13:28 ` [PATCH v2 0/2] " Jan Beulich
  4 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-12  9:38 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson, Donald D Dugger, Jun Nakajima,
	Keir Fraser
  Cc: xen-devel

>>> On 05.03.14 at 11:34, "Jan Beulich" <JBeulich@suse.com> wrote:
> 1: x86: Intel CPU family update
> 2: x86/idle: update to include further package/core residency MSRs
> 3: xenpm: use new Cx statistics interface
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ping: [PATCH 0/3] x86: support further Intel CPU families
  2014-03-12  9:38 ` Ping: [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
@ 2014-03-12 10:18   ` Ian Campbell
  0 siblings, 0 replies; 31+ messages in thread
From: Ian Campbell @ 2014-03-12 10:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Jackson, Jun Nakajima, Donald D Dugger,
	xen-devel

On Wed, 2014-03-12 at 09:38 +0000, Jan Beulich wrote:
> >>> On 05.03.14 at 11:34, "Jan Beulich" <JBeulich@suse.com> wrote:
> > 1: x86: Intel CPU family update
> > 2: x86/idle: update to include further package/core residency MSRs
> > 3: xenpm: use new Cx statistics interface
> > 
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>

Sorry, I discounted this as an x86 hypervisor patch but in reality there
is a lot of tools stuff.

I've added it to my queue, which is a bit backlogged but I'm hoping to
make a start on it this afternoon after I've processed my expenses from
my recent trip...

Ian.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
  2014-03-05 10:42   ` Jan Beulich
  2014-03-05 15:07   ` Boris Ostrovsky
@ 2014-03-13 14:11   ` Ian Campbell
  2014-03-13 14:27     ` Jan Beulich
  2014-03-18 16:18     ` Ian Jackson
  2014-03-13 14:28   ` Keir Fraser
  3 siblings, 2 replies; 31+ messages in thread
From: Ian Campbell @ 2014-03-13 14:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Jackson, Jun Nakajima,
	Donald D Dugger

On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
> With the number of these growing it becomes increasingly desirable to
> not repeatedly alter the sysctl interface to accommodate them. Replace
> the explicit listing of numbered states by arrays,

I don't have much of an opinion on the hypercall interface, so I'm just
taking that as a given and looking at the tools side accordingly.

> unused fields of which will remain untouched by the hypercall.

Are you supposed to initialise them to some known sentinal or are the
valid entries identified somewhere else (sorry, don't know much about
x86 pm).

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
>  
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
> [...]
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 *cxpt)

That is an incredibly subtle difference in the naming!

If v1 is considered deprecated then lets just get rid of it. There's
only one caller which you update in the next patch. I'd be perfectly
happy to have those collapsed and for the old interface to go away
immediately (or do it in patch 4/3 if you prefer).

> +{
> +    DECLARE_SYSCTL;
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> +                                   cxpt->nr * sizeof(*cxpt->triggers),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
> +                                   cxpt->nr * sizeof(*cxpt->residencies),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);

The original had these as BOUNCE_BOTH. If that was wrong  and this
change was therefore intentional (which I suspect is the case) please
note it in the commit message.

Ian.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
  2014-03-05 15:47   ` Boris Ostrovsky
@ 2014-03-13 14:12   ` Ian Campbell
  2014-03-18  2:45   ` Tian, Kevin
  2 siblings, 0 replies; 31+ messages in thread
From: Ian Campbell @ 2014-03-13 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Jackson, Jun Nakajima,
	Donald D Dugger

On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Apart from wishing to remove v1 and drop the v2 suffix in the new
interface this looks fine to me.

> 
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -29,6 +29,9 @@
>  #include <inttypes.h>
>  #include <sys/time.h>
>  
> +#define MAX_PKG_RESIDENCIES 12
> +#define MAX_CORE_RESIDENCIES 8
> +
>  #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
>  
>  static xc_interface *xc_handle;
> @@ -100,9 +103,9 @@ static void parse_cpuid_and_int(int argc
>      }
>  }
>  
> -static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
> +static void print_cxstat(int cpuid, const struct xc_cx_stat_v2 *cxstat)
>  {
> -    int i;
> +    unsigned int i;
>  
>      printf("cpu id               : %d\n", cpuid);
>      printf("total C-states       : %d\n", cxstat->nr);
> @@ -115,22 +118,20 @@ static void print_cxstat(int cpuid, stru
>          printf("                       residency  [%20"PRIu64" ms]\n",
>                 cxstat->residencies[i]/1000000UL);
>      }
> -    printf("pc2                  : [%20"PRIu64" ms]\n"
> -           "pc3                  : [%20"PRIu64" ms]\n"
> -           "pc6                  : [%20"PRIu64" ms]\n"
> -           "pc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
> -            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
> -    printf("cc3                  : [%20"PRIu64" ms]\n"
> -           "cc6                  : [%20"PRIu64" ms]\n"
> -           "cc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
> -            cxstat->cc7/1000000UL);
> +    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
> +        if ( cxstat->pc[i] )
> +           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->pc[i] / 1000000UL);
> +    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
> +        if ( cxstat->cc[i] )
> +           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->cc[i] / 1000000UL);
>      printf("\n");
>  }
>  
>  /* show cpu idle information on CPU cpuid */
> -static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid, struct xc_cx_stat *cxstat)
> +static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid,
> +                               struct xc_cx_stat_v2 *cxstat)
>  {
>      int ret = 0;
>      int max_cx_num = 0;
> @@ -145,24 +146,36 @@ static int get_cxstat_by_cpuid(xc_interf
>      if ( !max_cx_num )
>          return -ENODEV;
>  
> -    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->triggers )
> -        return -ENOMEM;
> -    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->residencies )
> +    cxstat->triggers = malloc(max_cx_num * sizeof(*cxstat->triggers));
> +    cxstat->residencies = malloc(max_cx_num * sizeof(*cxstat->residencies));
> +    cxstat->pc = malloc(MAX_PKG_RESIDENCIES * sizeof(*cxstat->pc));
> +    cxstat->cc = malloc(MAX_CORE_RESIDENCIES * sizeof(*cxstat->cc));
> +    if ( !cxstat->triggers || !cxstat->residencies ||
> +         !cxstat->pc || !cxstat->cc )
>      {
> +        free(cxstat->cc);
> +        free(cxstat->pc);
> +        free(cxstat->residencies);
>          free(cxstat->triggers);
>          return -ENOMEM;
>      }
>  
> -    ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
> +    cxstat->nr = max_cx_num;
> +    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
> +    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
> +
> +    ret = xc_pm_get_cx_stat(xc_handle, cpuid, cxstat);
>      if( ret )
>      {
>          ret = -errno;
>          free(cxstat->triggers);
>          free(cxstat->residencies);
> +        free(cxstat->pc);
> +        free(cxstat->cc);
>          cxstat->triggers = NULL;
>          cxstat->residencies = NULL;
> +        cxstat->pc = NULL;
> +        cxstat->cc = NULL;
>      }
>  
>      return ret;
> @@ -183,7 +196,7 @@ static int show_max_cstate(xc_interface 
>  static int show_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid)
>  {
>      int ret = 0;
> -    struct xc_cx_stat cxstatinfo;
> +    struct xc_cx_stat_v2 cxstatinfo;
>  
>      ret = get_cxstat_by_cpuid(xc_handle, cpuid, &cxstatinfo);
>      if ( ret )
> @@ -198,6 +211,8 @@ static int show_cxstat_by_cpuid(xc_inter
>  
>      free(cxstatinfo.triggers);
>      free(cxstatinfo.residencies);
> +    free(cxstatinfo.pc);
> +    free(cxstatinfo.cc);
>      return 0;
>  }
>  
> @@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
>  }
>  
>  static uint64_t usec_start, usec_end;
> -static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
> +static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
>  static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
>  static int *avgfreq;
>  static uint64_t *sum, *sum_cx, *sum_px;
> @@ -482,25 +497,26 @@ static void signal_int_handler(int signo
>              /* print out CC? and PC? */
>              for ( i = 0; i < socket_nr; i++ )
>              {
> +                unsigned int n;
>                  uint64_t res;
> +
>                  for ( j = 0; j <= info.max_cpu_index; j++ )
>                  {
>                      if ( cpu_to_socket[j] == socket_ids[i] )
>                          break;
>                  }
>                  printf("\nSocket %d\n", socket_ids[i]);
> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
> +                {
> +                    if ( n >= cxstat_end[j].nr_pc )
> +                        continue;
> +                    res = cxstat_end[j].pc[n];
> +                    if ( n < cxstat_start[j].nr_pc )
> +                        res -= cxstat_start[j].pc[n];
> +                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                           n + 1, res / 1000000UL,
> +                           100UL * res / (double)sum_cx[j]);
> +                }
>                  for ( k = 0; k < core_nr; k++ )
>                  {
>                      for ( j = 0; j <= info.max_cpu_index; j++ )
> @@ -510,15 +526,17 @@ static void signal_int_handler(int signo
>                              break;
>                      }
>                      printf("\t Core %d CPU %d\n", core_ids[k], j);
> -                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
> -                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
> -                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
> -                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> +                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
> +                    {
> +                        if ( n >= cxstat_end[j].nr_cc )
> +                            continue;
> +                        res = cxstat_end[j].cc[n];
> +                        if ( n < cxstat_start[j].nr_cc )
> +                            res -= cxstat_start[j].cc[n];
> +                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                               n + 1, res / 1000000UL,
> +                               100UL * res / (double)sum_cx[j]);
> +                    }
>                  }
>              }
>          }
> @@ -529,6 +547,8 @@ static void signal_int_handler(int signo
>      {
>          free(cxstat[i].triggers);
>          free(cxstat[i].residencies);
> +        free(cxstat[i].pc);
> +        free(cxstat[i].cc);
>          free(pxstat[i].trans_pt);
>          free(pxstat[i].pt);
>      }
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-13 14:11   ` Ian Campbell
@ 2014-03-13 14:27     ` Jan Beulich
  2014-03-13 15:34       ` Ian Campbell
  2014-03-18 16:18     ` Ian Jackson
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-13 14:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Keir Fraser, Ian Jackson, JunNakajima, Donald D Dugger

>>> On 13.03.14 at 15:11, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
>> With the number of these growing it becomes increasingly desirable to
>> not repeatedly alter the sysctl interface to accommodate them. Replace
>> the explicit listing of numbered states by arrays,
> 
> I don't have much of an opinion on the hypercall interface, so I'm just
> taking that as a given and looking at the tools side accordingly.
> 
>> unused fields of which will remain untouched by the hypercall.
> 
> Are you supposed to initialise them to some known sentinal or are the
> valid entries identified somewhere else (sorry, don't know much about
> x86 pm).

The best thing for the caller would be to zero the whole buffer.
But other known out of range values (like all ones) would do to.
In the end it's up to the caller to pre-fill the array suitably for it
to recognize valid fields.

>> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
>> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
>> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
>>  
>>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>> [...]
>> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 
> *cxpt)
> 
> That is an incredibly subtle difference in the naming!
> 
> If v1 is considered deprecated then lets just get rid of it. There's
> only one caller which you update in the next patch. I'd be perfectly
> happy to have those collapsed and for the old interface to go away
> immediately (or do it in patch 4/3 if you prefer).

I actually meant to raise the question of deprecation vs deletion/
replacement, but then forgot. So are you saying then that deleting/
replacing a libxc interface is not an issue, i.e. there's no need for
API compatibility? If so I'd indeed prefer to merge this and the 3rd
patch and simply adjust the structure definition and function
implementation, without any v2 or other subtleties.

>> +{
>> +    DECLARE_SYSCTL;
>> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
>> +                                   cxpt->nr * sizeof(*cxpt->triggers),
>> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
>> +                                   cxpt->nr * sizeof(*cxpt->residencies),
>> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> 
> The original had these as BOUNCE_BOTH. If that was wrong  and this
> change was therefore intentional (which I suspect is the case) please
> note it in the commit message.

Added for v2 (subject to an answer to the above).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
                     ` (2 preceding siblings ...)
  2014-03-13 14:11   ` Ian Campbell
@ 2014-03-13 14:28   ` Keir Fraser
  3 siblings, 0 replies; 31+ messages in thread
From: Keir Fraser @ 2014-03-13 14:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jun Nakajima, xen-devel, Ian Jackson, Ian Campbell,
	Donald D Dugger


[-- Attachment #1.1: Type: text/plain, Size: 14404 bytes --]

On Wed, Mar 5, 2014 at 10:37 AM, Jan Beulich <JBeulich@suse.com> wrote:

> With the number of these growing it becomes increasingly desirable to
> not repeatedly alter the sysctl interface to accommodate them. Replace
> the explicit listing of numbered states by arrays, unused fields of
> which will remain untouched by the hypercall.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>

Acked-by: Keir Fraser <keir@xen.org>


>
> --- 2014-02-13.orig/tools/libxc/xc_pm.c 2014-03-04 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xc_pm.c      2014-03-05 08:54:58.000000000 +0100
> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch,
>
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat
> *cxpt)
>  {
> -    DECLARE_SYSCTL;
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0,
> XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0,
> XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +    uint64_t pc[7], cc[7];
> +    struct xc_cx_stat_v2 cxpt2 = {
> +        .triggers = cxpt->triggers,
> +        .residencies = cxpt->residencies,
> +        .nr_pc = sizeof(pc) / sizeof(*pc),
> +        .nr_cc = sizeof(cc) / sizeof(*cc),
> +        .pc = pc,
> +        .cc = cc
> +    };
>      int max_cx, ret;
>
>      if( !cxpt->triggers || !cxpt->residencies )
>          return -EINVAL;
>
>      if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
> -        goto unlock_0;
> +        return ret;
>
> -    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
> -    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
> +    cxpt2.nr = max_cx;
> +    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
> +
> +    cxpt->nr = cxpt2.nr;
> +    cxpt->last = cxpt2.last;
> +    cxpt->idle_time = cxpt2.idle_time;
> +    cxpt->pc2 = pc[1];
> +    cxpt->pc3 = pc[2];
> +    cxpt->pc6 = pc[5];
> +    cxpt->pc7 = pc[6];
> +    cxpt->cc3 = cc[2];
> +    cxpt->cc6 = cc[5];
> +    cxpt->cc7 = cc[6];
> +
> +    return ret;
> +}
> +
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2
> *cxpt)
> +{
> +    DECLARE_SYSCTL;
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> +                                   cxpt->nr * sizeof(*cxpt->triggers),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
> +                                   cxpt->nr * sizeof(*cxpt->residencies),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
> +                                   cxpt->nr_pc * sizeof(*cxpt->pc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
> +                                   cxpt->nr_cc * sizeof(*cxpt->cc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    int ret = -1;
>
> -    ret = -1;
>      if ( xc_hypercall_bounce_pre(xch, triggers) )
>          goto unlock_0;
>      if ( xc_hypercall_bounce_pre(xch, residencies) )
>          goto unlock_1;
> +    if ( xc_hypercall_bounce_pre(xch, pc) )
> +        goto unlock_2;
> +    if ( xc_hypercall_bounce_pre(xch, cc) )
> +        goto unlock_3;
>
>      sysctl.cmd = XEN_SYSCTL_get_pmstat;
>      sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
>      sysctl.u.get_pmstat.cpuid = cpuid;
> +    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
> +    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
> +    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies,
> residencies);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
>
>      if ( (ret = xc_sysctl(xch, &sysctl)) )
> -        goto unlock_2;
> +        goto unlock_4;
>
>      cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
>      cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
>      cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
> -    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
> -    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
> -    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
> -    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
> -    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
> -    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
> -    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
> +    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
> +    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
>
> +unlock_4:
> +    xc_hypercall_bounce_post(xch, cc);
> +unlock_3:
> +    xc_hypercall_bounce_post(xch, pc);
>  unlock_2:
>      xc_hypercall_bounce_post(xch, residencies);
>  unlock_1:
> --- 2014-02-13.orig/tools/libxc/xenctrl.h       2014-03-04
> 17:43:06.000000000 +0100
> +++ 2014-02-13/tools/libxc/xenctrl.h    2014-03-04 17:50:49.000000000 +0100
> @@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch,
>  int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat
> *pxpt);
>  int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
>
> -struct xc_cx_stat {
> +struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
>      uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
>      uint32_t last;         /* last Cx state */
>      uint64_t idle_time;    /* idle time from boot */
> @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
>  };
>  typedef struct xc_cx_stat xc_cx_stat_t;
>
> +struct xc_cx_stat_v2 {
> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl
> C0 */
> +    uint32_t last;         /* last Cx state */
> +    uint64_t idle_time;    /* idle time from boot */
> +    uint64_t *triggers;    /* Cx trigger counts */
> +    uint64_t *residencies; /* Cx residencies */
> +    uint32_t nr_pc;        /* entry nr in pc[] */
> +    uint32_t nr_cc;        /* entry nr in cc[] */
> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
> +};
> +typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
> +
>  int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat
> *cxpt);
> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2
> *);
>  int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
>
>  int xc_cpu_online(xc_interface *xch, int cpu);
> --- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c        2014-03-04
> 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c     2014-03-04
> 17:38:39.000000000 +0100
> @@ -62,13 +62,17 @@
>
>  #define GET_HW_RES_IN_NS(msr, val) \
>      do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
> -#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
> +#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
>  #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
>  #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
>  #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
> +#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells
> only */
> +#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells
> only */
> +#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells
> only */
> +#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only
> */
>  #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
>  #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
> -#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
> +#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
>
>  static void lapic_timer_nop(void) { }
>  void (*__read_mostly lapic_timer_off)(void);
> @@ -111,8 +115,13 @@ struct hw_residencies
>  {
>      uint64_t pc2;
>      uint64_t pc3;
> +    uint64_t pc4;
>      uint64_t pc6;
>      uint64_t pc7;
> +    uint64_t pc8;
> +    uint64_t pc9;
> +    uint64_t pc10;
> +    uint64_t cc1;
>      uint64_t cc3;
>      uint64_t cc6;
>      uint64_t cc7;
> @@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
>
>      switch ( c->x86_model )
>      {
> +    /* 4th generation Intel Core (Haswell) */
> +    case 0x45:
> +        GET_PC8_RES(hw_res->pc8);
> +        GET_PC9_RES(hw_res->pc9);
> +        GET_PC10_RES(hw_res->pc10);
> +        /* fall through */
>      /* Sandy bridge */
>      case 0x2A:
>      case 0x2D:
> @@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
>      /* Haswell */
>      case 0x3C:
>      case 0x3F:
> -    case 0x45:
>      case 0x46:
>      /* future */
>      case 0x3D:
> @@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
>          GET_CC3_RES(hw_res->cc3);
>          GET_CC6_RES(hw_res->cc6);
>          break;
> +    /* various Atoms */
> +    case 0x27:
> +        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
> +        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        break;
> +    /* Silvermont */
> +    case 0x37:
> +    case 0x4A:
> +    case 0x4D:
> +    case 0x5A:
> +    case 0x5D:
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        GET_CC1_RES(hw_res->cc1);
> +        GET_CC6_RES(hw_res->cc6);
> +        break;
>      }
>  }
>
> @@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
>
>      get_hw_residencies(cpu, &hw_res);
>
> -    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"]
> PC7[%"PRId64"]\n",
> -           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
> -    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
> -           hw_res.cc3, hw_res.cc6,hw_res.cc7);
> +    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"]
> PC7[%"PRIu64"]\n",
> +           hw_res.pc2,
> +           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
> +           hw_res.pc6, hw_res.pc7);
> +    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
> +        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
> +               hw_res.pc8, hw_res.pc9, hw_res.pc10);
> +    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
> +           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
> +           hw_res.cc6, hw_res.cc7);
>  }
>
>  static char* acpi_cstate_method_name[] =
> @@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>      struct acpi_processor_power *power = processor_powers[cpuid];
>      uint64_t idle_usage = 0, idle_res = 0;
>      uint64_t usage[ACPI_PROCESSOR_MAX_POWER],
> res[ACPI_PROCESSOR_MAX_POWER];
> -    int i;
> -    struct hw_residencies hw_res;
> +    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
>
>      if ( power == NULL )
>      {
>          stat->last = 0;
>          stat->nr = 0;
>          stat->idle_time = 0;
> +        stat->nr_pc = 0;
> +        stat->nr_cc = 0;
>          return 0;
>      }
>
>      stat->last = power->last_state ? power->last_state->idx : 0;
>      stat->idle_time = get_cpu_idle_time(cpuid);
> +    nr = min(stat->nr, power->count);
>
>      /* mimic the stat when detail info hasn't been registered by dom0 */
>      if ( pm_idle_save == NULL )
> @@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>
>          usage[1] = idle_usage = 1;
>          res[1] = idle_res = stat->idle_time;
> -
> -        memset(&hw_res, 0, sizeof(hw_res));
>      }
>      else
>      {
> +        struct hw_residencies hw_res;
> +
>          stat->nr = power->count;
>
> -        for ( i = 1; i < power->count; i++ )
> +        for ( i = 1; i < nr; i++ )
>          {
>              spin_lock_irq(&power->stat_lock);
>              usage[i] = power->states[i].usage;
> @@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>          }
>
>          get_hw_residencies(cpuid, &hw_res);
> +
> +#define PUT_xC(what, n) do { \
> +        if ( stat->nr_##what >= n && \
> +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1)
> ) \
> +            return -EFAULT; \
> +        if ( hw_res.what##n ) \
> +            nr_##what = n; \
> +    } while ( 0 )
> +#define PUT_PC(n) PUT_xC(pc, n)
> +        PUT_PC(2);
> +        PUT_PC(3);
> +        PUT_PC(4);
> +        PUT_PC(6);
> +        PUT_PC(7);
> +        PUT_PC(8);
> +        PUT_PC(9);
> +        PUT_PC(10);
> +#undef PUT_PC
> +#define PUT_CC(n) PUT_xC(cc, n)
> +        PUT_CC(1);
> +        PUT_CC(3);
> +        PUT_CC(6);
> +        PUT_CC(7);
> +#undef PUT_CC
> +#undef PUT_xC
>      }
>
>      usage[0] = idle_usage;
>      res[0] = NOW() - idle_res;
>
> -    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
> -         copy_to_guest(stat->residencies, res, stat->nr) )
> +    if ( copy_to_guest(stat->triggers, usage, nr) ||
> +         copy_to_guest(stat->residencies, res, nr) )
>          return -EFAULT;
>
> -    stat->pc2 = hw_res.pc2;
> -    stat->pc3 = hw_res.pc3;
> -    stat->pc6 = hw_res.pc6;
> -    stat->pc7 = hw_res.pc7;
> -    stat->cc3 = hw_res.cc3;
> -    stat->cc6 = hw_res.cc6;
> -    stat->cc7 = hw_res.cc7;
> +    stat->nr_pc = nr_pc;
> +    stat->nr_cc = nr_cc;
>
>      return 0;
>  }
> --- 2014-02-13.orig/xen/include/public/sysctl.h 2014-03-04
> 17:43:06.000000000 +0100
> +++ 2014-02-13/xen/include/public/sysctl.h      2014-03-04
> 17:34:15.000000000 +0100
> @@ -34,7 +34,7 @@
>  #include "xen.h"
>  #include "domctl.h"
>
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
>
>  /*
>   * Read console content from Xen buffer ring.
> @@ -226,13 +226,10 @@ struct pm_cx_stat {
>      uint64_aligned_t idle_time;                 /* idle time from boot */
>      XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
>      XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> -    uint64_aligned_t pc2;
> -    uint64_aligned_t pc3;
> -    uint64_aligned_t pc6;
> -    uint64_aligned_t pc7;
> -    uint64_aligned_t cc3;
> -    uint64_aligned_t cc6;
> -    uint64_aligned_t cc7;
> +    uint32_t nr_pc;                          /* entry nr in pc[] */
> +    uint32_t nr_cc;                          /* entry nr in cc[] */
> +    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
> +    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
>  };
>
>  struct xen_sysctl_get_pmstat {
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 16920 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-13 14:27     ` Jan Beulich
@ 2014-03-13 15:34       ` Ian Campbell
  2014-03-13 15:48         ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Ian Campbell @ 2014-03-13 15:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Jackson, JunNakajima, Donald D Dugger

On Thu, 2014-03-13 at 14:27 +0000, Jan Beulich wrote:
> >>> On 13.03.14 at 15:11, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
> >> With the number of these growing it becomes increasingly desirable to
> >> not repeatedly alter the sysctl interface to accommodate them. Replace
> >> the explicit listing of numbered states by arrays,
> > 
> > I don't have much of an opinion on the hypercall interface, so I'm just
> > taking that as a given and looking at the tools side accordingly.
> > 
> >> unused fields of which will remain untouched by the hypercall.
> > 
> > Are you supposed to initialise them to some known sentinal or are the
> > valid entries identified somewhere else (sorry, don't know much about
> > x86 pm).
> 
> The best thing for the caller would be to zero the whole buffer.
> But other known out of range values (like all ones) would do to.
> In the end it's up to the caller to pre-fill the array suitably for it
> to recognize valid fields.

That makes sense. It look like your xenpm mods doesn't actually do this
though?

> 
> >> --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000 +0100
> >> +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
> >> @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch, 
> >>  
> >>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
> >> [...]
> >> +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2 
> > *cxpt)
> > 
> > That is an incredibly subtle difference in the naming!
> > 
> > If v1 is considered deprecated then lets just get rid of it. There's
> > only one caller which you update in the next patch. I'd be perfectly
> > happy to have those collapsed and for the old interface to go away
> > immediately (or do it in patch 4/3 if you prefer).
> 
> I actually meant to raise the question of deprecation vs deletion/
> replacement, but then forgot. So are you saying then that deleting/
> replacing a libxc interface is not an issue, i.e. there's no need for
> API compatibility?

Correct. libxc makes no API or ABI guarantees.

> If so I'd indeed prefer to merge this and the 3rd
> patch and simply adjust the structure definition and function
> implementation, without any v2 or other subtleties.

That's fine and even preferred as far as I'm concerned.

Ian.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-13 15:34       ` Ian Campbell
@ 2014-03-13 15:48         ` Jan Beulich
  2014-03-13 15:53           ` Ian Campbell
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-13 15:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Keir Fraser, Ian Jackson, JunNakajima, Donald D Dugger

>>> On 13.03.14 at 16:34, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2014-03-13 at 14:27 +0000, Jan Beulich wrote:
>> >>> On 13.03.14 at 15:11, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
>> >> With the number of these growing it becomes increasingly desirable to
>> >> not repeatedly alter the sysctl interface to accommodate them. Replace
>> >> the explicit listing of numbered states by arrays,
>> > 
>> > I don't have much of an opinion on the hypercall interface, so I'm just
>> > taking that as a given and looking at the tools side accordingly.
>> > 
>> >> unused fields of which will remain untouched by the hypercall.
>> > 
>> > Are you supposed to initialise them to some known sentinal or are the
>> > valid entries identified somewhere else (sorry, don't know much about
>> > x86 pm).
>> 
>> The best thing for the caller would be to zero the whole buffer.
>> But other known out of range values (like all ones) would do to.
>> In the end it's up to the caller to pre-fill the array suitably for it
>> to recognize valid fields.
> 
> That makes sense. It look like your xenpm mods doesn't actually do this
> though?

I should be using calloc() there to avoid future questions of this
kind, but in fact it was intentional: The bounce buffers get cleared
upon allocation, and since I made the bounce direction out-only,
the intended effect is what we want. Too subtle/fragile perhaps...

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-13 15:48         ` Jan Beulich
@ 2014-03-13 15:53           ` Ian Campbell
  0 siblings, 0 replies; 31+ messages in thread
From: Ian Campbell @ 2014-03-13 15:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Jackson, JunNakajima, Donald D Dugger

On Thu, 2014-03-13 at 15:48 +0000, Jan Beulich wrote:
> >>> On 13.03.14 at 16:34, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Thu, 2014-03-13 at 14:27 +0000, Jan Beulich wrote:
> >> >>> On 13.03.14 at 15:11, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >> > On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
> >> >> With the number of these growing it becomes increasingly desirable to
> >> >> not repeatedly alter the sysctl interface to accommodate them. Replace
> >> >> the explicit listing of numbered states by arrays,
> >> > 
> >> > I don't have much of an opinion on the hypercall interface, so I'm just
> >> > taking that as a given and looking at the tools side accordingly.
> >> > 
> >> >> unused fields of which will remain untouched by the hypercall.
> >> > 
> >> > Are you supposed to initialise them to some known sentinal or are the
> >> > valid entries identified somewhere else (sorry, don't know much about
> >> > x86 pm).
> >> 
> >> The best thing for the caller would be to zero the whole buffer.
> >> But other known out of range values (like all ones) would do to.
> >> In the end it's up to the caller to pre-fill the array suitably for it
> >> to recognize valid fields.
> > 
> > That makes sense. It look like your xenpm mods doesn't actually do this
> > though?
> 
> I should be using calloc() there to avoid future questions of this
> kind, but in fact it was intentional: The bounce buffers get cleared
> upon allocation, and since I made the bounce direction out-only,
> the intended effect is what we want. Too subtle/fragile perhaps...

Worthy of a comment at least I think ;-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 0/2] x86: support further Intel CPU families
  2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
                   ` (3 preceding siblings ...)
  2014-03-12  9:38 ` Ping: [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
@ 2014-03-17 13:28 ` Jan Beulich
  2014-03-17 13:38   ` [PATCH v2 1/2] x86: Intel CPU family update Jan Beulich
  2014-03-17 13:39   ` [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs Jan Beulich
  4 siblings, 2 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-17 13:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

1: x86: Intel CPU family update
2: x86/idle: update to include further package/core residency MSRs

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: fold previously separate patches 2 and 3; patch 1 is unchanged

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/2] x86: Intel CPU family update
  2014-03-17 13:28 ` [PATCH v2 0/2] " Jan Beulich
@ 2014-03-17 13:38   ` Jan Beulich
  2014-03-17 13:39   ` [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-17 13:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

... according to revision 49 of the Intel SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Intel: Clarification is needed that I correctly resolved the ambiguity
the manual has for 06_4D: Table 35-1 lists this among the Silvermont
ones and uses 06_4E for Future Generation Intel Core; section 35.1 and
table 35-24, however, use 06_4D throughout. My take is that the latter
is what is wrong.

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -139,6 +139,9 @@ static void do_get_hw_residencies(void *
     case 0x3F:
     case 0x45:
     case 0x46:
+    /* future */
+    case 0x3D:
+    case 0x4E:
         GET_PC2_RES(hw_res->pc2);
         GET_CC7_RES(hw_res->cc7);
         /* fall through */
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1966,10 +1966,14 @@ static const struct lbr_info *last_branc
         case 58: case 62:
         /* Haswell */
         case 60: case 63: case 69: case 70:
+        /* future */
+        case 61: case 78:
             return nh_lbr;
             break;
         /* Atom */
-        case 28:
+        case 28: case 38: case 39: case 53: case 54:
+        /* Silvermont */
+        case 55: case 74: case 77: case 90: case 93:
             return at_lbr;
             break;
         }
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -916,6 +916,10 @@ int vmx_vpmu_initialise(struct vcpu *v, 
         case 0x3f:
         case 0x45:
         case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
             ret = core2_vpmu_initialise(v, vpmu_flags);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;




[-- Attachment #2: x86-Intel-families.patch --]
[-- Type: text/plain, Size: 1771 bytes --]

x86: Intel CPU family update

... according to revision 49 of the Intel SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Intel: Clarification is needed that I correctly resolved the ambiguity
the manual has for 06_4D: Table 35-1 lists this among the Silvermont
ones and uses 06_4E for Future Generation Intel Core; section 35.1 and
table 35-24, however, use 06_4D throughout. My take is that the latter
is what is wrong.

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -139,6 +139,9 @@ static void do_get_hw_residencies(void *
     case 0x3F:
     case 0x45:
     case 0x46:
+    /* future */
+    case 0x3D:
+    case 0x4E:
         GET_PC2_RES(hw_res->pc2);
         GET_CC7_RES(hw_res->cc7);
         /* fall through */
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1966,10 +1966,14 @@ static const struct lbr_info *last_branc
         case 58: case 62:
         /* Haswell */
         case 60: case 63: case 69: case 70:
+        /* future */
+        case 61: case 78:
             return nh_lbr;
             break;
         /* Atom */
-        case 28:
+        case 28: case 38: case 39: case 53: case 54:
+        /* Silvermont */
+        case 55: case 74: case 77: case 90: case 93:
             return at_lbr;
             break;
         }
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -916,6 +916,10 @@ int vmx_vpmu_initialise(struct vcpu *v, 
         case 0x3f:
         case 0x45:
         case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
             ret = core2_vpmu_initialise(v, vpmu_flags);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs
  2014-03-17 13:28 ` [PATCH v2 0/2] " Jan Beulich
  2014-03-17 13:38   ` [PATCH v2 1/2] x86: Intel CPU family update Jan Beulich
@ 2014-03-17 13:39   ` Jan Beulich
  2014-03-17 13:43     ` Ian Campbell
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-03-17 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: Jun Nakajima, Keir Fraser, Ian Jackson, Ian Campbell,
	Donald D Dugger

[-- Attachment #1: Type: text/plain, Size: 20446 bytes --]

With the number of these growing it becomes increasingly desirable to
not repeatedly alter the sysctl interface to accommodate them. Replace
the explicit listing of numbered states by arrays, unused fields of
which will remain untouched by the hypercall.

The adjusted sysctl interface at once fixes an unrelated shortcoming
of the original one: The "nr" field, specifying the size of the
"triggers" and "residencies" arrays, has to be an input (along with
being an output), which the previous implementation didn't obey to.

Note that the bouncing direction in the libxc interface at once gets
corrected to OUT (was BOTH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
---
v2: Fully replace old interface and merge in previously separate patch
    adjusting xenpm.c. Use calloc() for array allocations in xenpm.c.

--- 2014-03-17.orig/tools/libxc/xc_pm.c	2014-03-17 08:16:19.000000000 +0100
+++ 2014-03-17/tools/libxc/xc_pm.c	2014-03-17 12:19:38.000000000 +0100
@@ -123,45 +123,53 @@ int xc_pm_get_max_cx(xc_interface *xch, 
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
 {
     DECLARE_SYSCTL;
-    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    int max_cx, ret;
+    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
+                                   cxpt->nr * sizeof(*cxpt->triggers),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
+                                   cxpt->nr * sizeof(*cxpt->residencies),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
+                                   cxpt->nr_pc * sizeof(*cxpt->pc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
+                                   cxpt->nr_cc * sizeof(*cxpt->cc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret = -1;
 
-    if( !cxpt->triggers || !cxpt->residencies )
-        return -EINVAL;
-
-    if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
-        goto unlock_0;
-
-    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
-    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
-
-    ret = -1;
     if ( xc_hypercall_bounce_pre(xch, triggers) )
         goto unlock_0;
     if ( xc_hypercall_bounce_pre(xch, residencies) )
         goto unlock_1;
+    if ( xc_hypercall_bounce_pre(xch, pc) )
+        goto unlock_2;
+    if ( xc_hypercall_bounce_pre(xch, cc) )
+        goto unlock_3;
 
     sysctl.cmd = XEN_SYSCTL_get_pmstat;
     sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
     sysctl.u.get_pmstat.cpuid = cpuid;
+    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
+    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
+    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
 
     if ( (ret = xc_sysctl(xch, &sysctl)) )
-        goto unlock_2;
+        goto unlock_4;
 
     cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
     cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
     cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
-    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
-    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
-    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
-    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
-    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
-    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
-    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
+    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
+    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
 
+unlock_4:
+    xc_hypercall_bounce_post(xch, cc);
+unlock_3:
+    xc_hypercall_bounce_post(xch, pc);
 unlock_2:
     xc_hypercall_bounce_post(xch, residencies);
 unlock_1:
--- 2014-03-17.orig/tools/libxc/xenctrl.h	2014-01-14 13:33:16.000000000 +0100
+++ 2014-03-17/tools/libxc/xenctrl.h	2014-03-17 12:20:20.000000000 +0100
@@ -1935,18 +1935,15 @@ int xc_pm_get_pxstat(xc_interface *xch, 
 int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
 
 struct xc_cx_stat {
-    uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
+    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
     uint32_t last;         /* last Cx state */
     uint64_t idle_time;    /* idle time from boot */
     uint64_t *triggers;    /* Cx trigger counts */
     uint64_t *residencies; /* Cx residencies */
-    uint64_t pc2;
-    uint64_t pc3;
-    uint64_t pc6;
-    uint64_t pc7;
-    uint64_t cc3;
-    uint64_t cc6;
-    uint64_t cc7;
+    uint32_t nr_pc;        /* entry nr in pc[] */
+    uint32_t nr_cc;        /* entry nr in cc[] */
+    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
+    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
 };
 typedef struct xc_cx_stat xc_cx_stat_t;
 
--- 2014-03-17.orig/tools/misc/xenpm.c	2013-07-16 08:16:10.000000000 +0200
+++ 2014-03-17/tools/misc/xenpm.c	2014-03-17 12:24:45.000000000 +0100
@@ -29,6 +29,9 @@
 #include <inttypes.h>
 #include <sys/time.h>
 
+#define MAX_PKG_RESIDENCIES 12
+#define MAX_CORE_RESIDENCIES 8
+
 #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
 
 static xc_interface *xc_handle;
@@ -102,7 +105,7 @@ static void parse_cpuid_and_int(int argc
 
 static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
 {
-    int i;
+    unsigned int i;
 
     printf("cpu id               : %d\n", cpuid);
     printf("total C-states       : %d\n", cxstat->nr);
@@ -115,17 +118,14 @@ static void print_cxstat(int cpuid, stru
         printf("                       residency  [%20"PRIu64" ms]\n",
                cxstat->residencies[i]/1000000UL);
     }
-    printf("pc2                  : [%20"PRIu64" ms]\n"
-           "pc3                  : [%20"PRIu64" ms]\n"
-           "pc6                  : [%20"PRIu64" ms]\n"
-           "pc7                  : [%20"PRIu64" ms]\n",
-            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
-            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
-    printf("cc3                  : [%20"PRIu64" ms]\n"
-           "cc6                  : [%20"PRIu64" ms]\n"
-           "cc7                  : [%20"PRIu64" ms]\n",
-            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
-            cxstat->cc7/1000000UL);
+    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
+        if ( cxstat->pc[i] )
+           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->pc[i] / 1000000UL);
+    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
+        if ( cxstat->cc[i] )
+           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->cc[i] / 1000000UL);
     printf("\n");
 }
 
@@ -145,24 +145,36 @@ static int get_cxstat_by_cpuid(xc_interf
     if ( !max_cx_num )
         return -ENODEV;
 
-    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->triggers )
-        return -ENOMEM;
-    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->residencies )
+    cxstat->triggers = calloc(max_cx_num, sizeof(*cxstat->triggers));
+    cxstat->residencies = calloc(max_cx_num, sizeof(*cxstat->residencies));
+    cxstat->pc = calloc(MAX_PKG_RESIDENCIES, sizeof(*cxstat->pc));
+    cxstat->cc = calloc(MAX_CORE_RESIDENCIES, sizeof(*cxstat->cc));
+    if ( !cxstat->triggers || !cxstat->residencies ||
+         !cxstat->pc || !cxstat->cc )
     {
+        free(cxstat->cc);
+        free(cxstat->pc);
+        free(cxstat->residencies);
         free(cxstat->triggers);
         return -ENOMEM;
     }
 
+    cxstat->nr = max_cx_num;
+    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
+    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
+
     ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
     if( ret )
     {
         ret = -errno;
         free(cxstat->triggers);
         free(cxstat->residencies);
+        free(cxstat->pc);
+        free(cxstat->cc);
         cxstat->triggers = NULL;
         cxstat->residencies = NULL;
+        cxstat->pc = NULL;
+        cxstat->cc = NULL;
     }
 
     return ret;
@@ -198,6 +210,8 @@ static int show_cxstat_by_cpuid(xc_inter
 
     free(cxstatinfo.triggers);
     free(cxstatinfo.residencies);
+    free(cxstatinfo.pc);
+    free(cxstatinfo.cc);
     return 0;
 }
 
@@ -482,25 +496,26 @@ static void signal_int_handler(int signo
             /* print out CC? and PC? */
             for ( i = 0; i < socket_nr; i++ )
             {
+                unsigned int n;
                 uint64_t res;
+
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
                     if ( cpu_to_socket[j] == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
-                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
-                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
-                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
-                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
-                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
+                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
+                {
+                    if ( n >= cxstat_end[j].nr_pc )
+                        continue;
+                    res = cxstat_end[j].pc[n];
+                    if ( n < cxstat_start[j].nr_pc )
+                        res -= cxstat_start[j].pc[n];
+                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
+                           n + 1, res / 1000000UL,
+                           100UL * res / (double)sum_cx[j]);
+                }
                 for ( k = 0; k < core_nr; k++ )
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
@@ -510,15 +525,17 @@ static void signal_int_handler(int signo
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
-                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
-                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
-                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
-                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                           100UL * res / (double)sum_cx[j]);
+                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
+                    {
+                        if ( n >= cxstat_end[j].nr_cc )
+                            continue;
+                        res = cxstat_end[j].cc[n];
+                        if ( n < cxstat_start[j].nr_cc )
+                            res -= cxstat_start[j].cc[n];
+                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
+                               n + 1, res / 1000000UL,
+                               100UL * res / (double)sum_cx[j]);
+                    }
                 }
             }
         }
@@ -529,6 +546,8 @@ static void signal_int_handler(int signo
     {
         free(cxstat[i].triggers);
         free(cxstat[i].residencies);
+        free(cxstat[i].pc);
+        free(cxstat[i].cc);
         free(pxstat[i].trans_pt);
         free(pxstat[i].pt);
     }
--- 2014-03-17.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-05 09:52:16.000000000 +0100
+++ 2014-03-17/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
@@ -62,13 +62,17 @@
 
 #define GET_HW_RES_IN_NS(msr, val) \
     do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
-#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
+#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
 #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
 #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
 #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
+#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
+#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
+#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
+#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
 #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
 #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
-#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
+#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
 
 static void lapic_timer_nop(void) { }
 void (*__read_mostly lapic_timer_off)(void);
@@ -111,8 +115,13 @@ struct hw_residencies
 {
     uint64_t pc2;
     uint64_t pc3;
+    uint64_t pc4;
     uint64_t pc6;
     uint64_t pc7;
+    uint64_t pc8;
+    uint64_t pc9;
+    uint64_t pc10;
+    uint64_t cc1;
     uint64_t cc3;
     uint64_t cc6;
     uint64_t cc7;
@@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
 
     switch ( c->x86_model )
     {
+    /* 4th generation Intel Core (Haswell) */
+    case 0x45:
+        GET_PC8_RES(hw_res->pc8);
+        GET_PC9_RES(hw_res->pc9);
+        GET_PC10_RES(hw_res->pc10);
+        /* fall through */
     /* Sandy bridge */
     case 0x2A:
     case 0x2D:
@@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
     /* Haswell */
     case 0x3C:
     case 0x3F:
-    case 0x45:
     case 0x46:
     /* future */
     case 0x3D:
@@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
         GET_CC3_RES(hw_res->cc3);
         GET_CC6_RES(hw_res->cc6);
         break;
+    /* various Atoms */
+    case 0x27:
+        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
+        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        break;
+    /* Silvermont */
+    case 0x37:
+    case 0x4A:
+    case 0x4D:
+    case 0x5A:
+    case 0x5D:
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        GET_CC1_RES(hw_res->cc1);
+        GET_CC6_RES(hw_res->cc6);
+        break;
     }
 }
 
@@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
 
     get_hw_residencies(cpu, &hw_res);
 
-    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
-           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
-    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
-           hw_res.cc3, hw_res.cc6,hw_res.cc7);
+    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
+           hw_res.pc2,
+           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
+           hw_res.pc6, hw_res.pc7);
+    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
+        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
+               hw_res.pc8, hw_res.pc9, hw_res.pc10);
+    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
+           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
+           hw_res.cc6, hw_res.cc7);
 }
 
 static char* acpi_cstate_method_name[] =
@@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
     struct acpi_processor_power *power = processor_powers[cpuid];
     uint64_t idle_usage = 0, idle_res = 0;
     uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
-    int i;
-    struct hw_residencies hw_res;
+    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
 
     if ( power == NULL )
     {
         stat->last = 0;
         stat->nr = 0;
         stat->idle_time = 0;
+        stat->nr_pc = 0;
+        stat->nr_cc = 0;
         return 0;
     }
 
     stat->last = power->last_state ? power->last_state->idx : 0;
     stat->idle_time = get_cpu_idle_time(cpuid);
+    nr = min(stat->nr, power->count);
 
     /* mimic the stat when detail info hasn't been registered by dom0 */
     if ( pm_idle_save == NULL )
@@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
 
         usage[1] = idle_usage = 1;
         res[1] = idle_res = stat->idle_time;
-
-        memset(&hw_res, 0, sizeof(hw_res));
     }
     else
     {
+        struct hw_residencies hw_res;
+
         stat->nr = power->count;
 
-        for ( i = 1; i < power->count; i++ )
+        for ( i = 1; i < nr; i++ )
         {
             spin_lock_irq(&power->stat_lock);
             usage[i] = power->states[i].usage;
@@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
         }
 
         get_hw_residencies(cpuid, &hw_res);
+
+#define PUT_xC(what, n) do { \
+        if ( stat->nr_##what >= n && \
+             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
+            return -EFAULT; \
+        if ( hw_res.what##n ) \
+            nr_##what = n; \
+    } while ( 0 )
+#define PUT_PC(n) PUT_xC(pc, n)
+        PUT_PC(2);
+        PUT_PC(3);
+        PUT_PC(4);
+        PUT_PC(6);
+        PUT_PC(7);
+        PUT_PC(8);
+        PUT_PC(9);
+        PUT_PC(10);
+#undef PUT_PC
+#define PUT_CC(n) PUT_xC(cc, n)
+        PUT_CC(1);
+        PUT_CC(3);
+        PUT_CC(6);
+        PUT_CC(7);
+#undef PUT_CC
+#undef PUT_xC
     }
 
     usage[0] = idle_usage;
     res[0] = NOW() - idle_res;
 
-    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
-         copy_to_guest(stat->residencies, res, stat->nr) )
+    if ( copy_to_guest(stat->triggers, usage, nr) ||
+         copy_to_guest(stat->residencies, res, nr) )
         return -EFAULT;
 
-    stat->pc2 = hw_res.pc2;
-    stat->pc3 = hw_res.pc3;
-    stat->pc6 = hw_res.pc6;
-    stat->pc7 = hw_res.pc7;
-    stat->cc3 = hw_res.cc3;
-    stat->cc6 = hw_res.cc6;
-    stat->cc7 = hw_res.cc7;
+    stat->nr_pc = nr_pc;
+    stat->nr_cc = nr_cc;
 
     return 0;
 }
--- 2014-03-17.orig/xen/include/public/sysctl.h	2013-05-27 09:58:49.000000000 +0200
+++ 2014-03-17/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
 
 /*
  * Read console content from Xen buffer ring.
@@ -226,13 +226,10 @@ struct pm_cx_stat {
     uint64_aligned_t idle_time;                 /* idle time from boot */
     XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
     XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
-    uint64_aligned_t pc2;
-    uint64_aligned_t pc3;
-    uint64_aligned_t pc6;
-    uint64_aligned_t pc7;
-    uint64_aligned_t cc3;
-    uint64_aligned_t cc6;
-    uint64_aligned_t cc7;
+    uint32_t nr_pc;                          /* entry nr in pc[] */
+    uint32_t nr_cc;                          /* entry nr in cc[] */
+    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
+    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
 };
 
 struct xen_sysctl_get_pmstat {



[-- Attachment #2: x86-Intel-idle-residencies.patch --]
[-- Type: text/plain, Size: 20509 bytes --]

x86/idle: update to include further package/core residency MSRs

With the number of these growing it becomes increasingly desirable to
not repeatedly alter the sysctl interface to accommodate them. Replace
the explicit listing of numbered states by arrays, unused fields of
which will remain untouched by the hypercall.

The adjusted sysctl interface at once fixes an unrelated shortcoming
of the original one: The "nr" field, specifying the size of the
"triggers" and "residencies" arrays, has to be an input (along with
being an output), which the previous implementation didn't obey to.

Note that the bouncing direction in the libxc interface at once gets
corrected to OUT (was BOTH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
---
v2: Fully replace old interface and merge in previously separate patch
    adjusting xenpm.c. Use calloc() for array allocations in xenpm.c.

--- 2014-03-17.orig/tools/libxc/xc_pm.c	2014-03-17 08:16:19.000000000 +0100
+++ 2014-03-17/tools/libxc/xc_pm.c	2014-03-17 12:19:38.000000000 +0100
@@ -123,45 +123,53 @@ int xc_pm_get_max_cx(xc_interface *xch, 
 int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
 {
     DECLARE_SYSCTL;
-    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-    int max_cx, ret;
+    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
+                                   cxpt->nr * sizeof(*cxpt->triggers),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
+                                   cxpt->nr * sizeof(*cxpt->residencies),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
+                                   cxpt->nr_pc * sizeof(*cxpt->pc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
+                                   cxpt->nr_cc * sizeof(*cxpt->cc),
+                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int ret = -1;
 
-    if( !cxpt->triggers || !cxpt->residencies )
-        return -EINVAL;
-
-    if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
-        goto unlock_0;
-
-    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
-    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
-
-    ret = -1;
     if ( xc_hypercall_bounce_pre(xch, triggers) )
         goto unlock_0;
     if ( xc_hypercall_bounce_pre(xch, residencies) )
         goto unlock_1;
+    if ( xc_hypercall_bounce_pre(xch, pc) )
+        goto unlock_2;
+    if ( xc_hypercall_bounce_pre(xch, cc) )
+        goto unlock_3;
 
     sysctl.cmd = XEN_SYSCTL_get_pmstat;
     sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
     sysctl.u.get_pmstat.cpuid = cpuid;
+    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
+    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
+    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
     set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
+    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
 
     if ( (ret = xc_sysctl(xch, &sysctl)) )
-        goto unlock_2;
+        goto unlock_4;
 
     cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
     cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
     cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
-    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
-    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
-    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
-    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
-    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
-    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
-    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
+    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
+    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
 
+unlock_4:
+    xc_hypercall_bounce_post(xch, cc);
+unlock_3:
+    xc_hypercall_bounce_post(xch, pc);
 unlock_2:
     xc_hypercall_bounce_post(xch, residencies);
 unlock_1:
--- 2014-03-17.orig/tools/libxc/xenctrl.h	2014-01-14 13:33:16.000000000 +0100
+++ 2014-03-17/tools/libxc/xenctrl.h	2014-03-17 12:20:20.000000000 +0100
@@ -1935,18 +1935,15 @@ int xc_pm_get_pxstat(xc_interface *xch, 
 int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
 
 struct xc_cx_stat {
-    uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
+    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
     uint32_t last;         /* last Cx state */
     uint64_t idle_time;    /* idle time from boot */
     uint64_t *triggers;    /* Cx trigger counts */
     uint64_t *residencies; /* Cx residencies */
-    uint64_t pc2;
-    uint64_t pc3;
-    uint64_t pc6;
-    uint64_t pc7;
-    uint64_t cc3;
-    uint64_t cc6;
-    uint64_t cc7;
+    uint32_t nr_pc;        /* entry nr in pc[] */
+    uint32_t nr_cc;        /* entry nr in cc[] */
+    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
+    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
 };
 typedef struct xc_cx_stat xc_cx_stat_t;
 
--- 2014-03-17.orig/tools/misc/xenpm.c	2013-07-16 08:16:10.000000000 +0200
+++ 2014-03-17/tools/misc/xenpm.c	2014-03-17 12:24:45.000000000 +0100
@@ -29,6 +29,9 @@
 #include <inttypes.h>
 #include <sys/time.h>
 
+#define MAX_PKG_RESIDENCIES 12
+#define MAX_CORE_RESIDENCIES 8
+
 #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
 
 static xc_interface *xc_handle;
@@ -102,7 +105,7 @@ static void parse_cpuid_and_int(int argc
 
 static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
 {
-    int i;
+    unsigned int i;
 
     printf("cpu id               : %d\n", cpuid);
     printf("total C-states       : %d\n", cxstat->nr);
@@ -115,17 +118,14 @@ static void print_cxstat(int cpuid, stru
         printf("                       residency  [%20"PRIu64" ms]\n",
                cxstat->residencies[i]/1000000UL);
     }
-    printf("pc2                  : [%20"PRIu64" ms]\n"
-           "pc3                  : [%20"PRIu64" ms]\n"
-           "pc6                  : [%20"PRIu64" ms]\n"
-           "pc7                  : [%20"PRIu64" ms]\n",
-            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
-            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
-    printf("cc3                  : [%20"PRIu64" ms]\n"
-           "cc6                  : [%20"PRIu64" ms]\n"
-           "cc7                  : [%20"PRIu64" ms]\n",
-            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
-            cxstat->cc7/1000000UL);
+    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
+        if ( cxstat->pc[i] )
+           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->pc[i] / 1000000UL);
+    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
+        if ( cxstat->cc[i] )
+           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
+                  cxstat->cc[i] / 1000000UL);
     printf("\n");
 }
 
@@ -145,24 +145,36 @@ static int get_cxstat_by_cpuid(xc_interf
     if ( !max_cx_num )
         return -ENODEV;
 
-    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->triggers )
-        return -ENOMEM;
-    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
-    if ( !cxstat->residencies )
+    cxstat->triggers = calloc(max_cx_num, sizeof(*cxstat->triggers));
+    cxstat->residencies = calloc(max_cx_num, sizeof(*cxstat->residencies));
+    cxstat->pc = calloc(MAX_PKG_RESIDENCIES, sizeof(*cxstat->pc));
+    cxstat->cc = calloc(MAX_CORE_RESIDENCIES, sizeof(*cxstat->cc));
+    if ( !cxstat->triggers || !cxstat->residencies ||
+         !cxstat->pc || !cxstat->cc )
     {
+        free(cxstat->cc);
+        free(cxstat->pc);
+        free(cxstat->residencies);
         free(cxstat->triggers);
         return -ENOMEM;
     }
 
+    cxstat->nr = max_cx_num;
+    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
+    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
+
     ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
     if( ret )
     {
         ret = -errno;
         free(cxstat->triggers);
         free(cxstat->residencies);
+        free(cxstat->pc);
+        free(cxstat->cc);
         cxstat->triggers = NULL;
         cxstat->residencies = NULL;
+        cxstat->pc = NULL;
+        cxstat->cc = NULL;
     }
 
     return ret;
@@ -198,6 +210,8 @@ static int show_cxstat_by_cpuid(xc_inter
 
     free(cxstatinfo.triggers);
     free(cxstatinfo.residencies);
+    free(cxstatinfo.pc);
+    free(cxstatinfo.cc);
     return 0;
 }
 
@@ -482,25 +496,26 @@ static void signal_int_handler(int signo
             /* print out CC? and PC? */
             for ( i = 0; i < socket_nr; i++ )
             {
+                unsigned int n;
                 uint64_t res;
+
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
                     if ( cpu_to_socket[j] == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
-                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
-                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
-                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
-                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
-                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
-                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                       100UL * res / (double)sum_cx[j]);
+                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
+                {
+                    if ( n >= cxstat_end[j].nr_pc )
+                        continue;
+                    res = cxstat_end[j].pc[n];
+                    if ( n < cxstat_start[j].nr_pc )
+                        res -= cxstat_start[j].pc[n];
+                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
+                           n + 1, res / 1000000UL,
+                           100UL * res / (double)sum_cx[j]);
+                }
                 for ( k = 0; k < core_nr; k++ )
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
@@ -510,15 +525,17 @@ static void signal_int_handler(int signo
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
-                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
-                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
-                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
-                           100UL * res / (double)sum_cx[j]);
-                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
-                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
-                           100UL * res / (double)sum_cx[j]);
+                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
+                    {
+                        if ( n >= cxstat_end[j].nr_cc )
+                            continue;
+                        res = cxstat_end[j].cc[n];
+                        if ( n < cxstat_start[j].nr_cc )
+                            res -= cxstat_start[j].cc[n];
+                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
+                               n + 1, res / 1000000UL,
+                               100UL * res / (double)sum_cx[j]);
+                    }
                 }
             }
         }
@@ -529,6 +546,8 @@ static void signal_int_handler(int signo
     {
         free(cxstat[i].triggers);
         free(cxstat[i].residencies);
+        free(cxstat[i].pc);
+        free(cxstat[i].cc);
         free(pxstat[i].trans_pt);
         free(pxstat[i].pt);
     }
--- 2014-03-17.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-05 09:52:16.000000000 +0100
+++ 2014-03-17/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
@@ -62,13 +62,17 @@
 
 #define GET_HW_RES_IN_NS(msr, val) \
     do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
-#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
+#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
 #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
 #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
 #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
+#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
+#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
+#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
+#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
 #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
 #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
-#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
+#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
 
 static void lapic_timer_nop(void) { }
 void (*__read_mostly lapic_timer_off)(void);
@@ -111,8 +115,13 @@ struct hw_residencies
 {
     uint64_t pc2;
     uint64_t pc3;
+    uint64_t pc4;
     uint64_t pc6;
     uint64_t pc7;
+    uint64_t pc8;
+    uint64_t pc9;
+    uint64_t pc10;
+    uint64_t cc1;
     uint64_t cc3;
     uint64_t cc6;
     uint64_t cc7;
@@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
 
     switch ( c->x86_model )
     {
+    /* 4th generation Intel Core (Haswell) */
+    case 0x45:
+        GET_PC8_RES(hw_res->pc8);
+        GET_PC9_RES(hw_res->pc9);
+        GET_PC10_RES(hw_res->pc10);
+        /* fall through */
     /* Sandy bridge */
     case 0x2A:
     case 0x2D:
@@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
     /* Haswell */
     case 0x3C:
     case 0x3F:
-    case 0x45:
     case 0x46:
     /* future */
     case 0x3D:
@@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
         GET_CC3_RES(hw_res->cc3);
         GET_CC6_RES(hw_res->cc6);
         break;
+    /* various Atoms */
+    case 0x27:
+        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
+        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        break;
+    /* Silvermont */
+    case 0x37:
+    case 0x4A:
+    case 0x4D:
+    case 0x5A:
+    case 0x5D:
+        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
+        GET_CC1_RES(hw_res->cc1);
+        GET_CC6_RES(hw_res->cc6);
+        break;
     }
 }
 
@@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
 
     get_hw_residencies(cpu, &hw_res);
 
-    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
-           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
-    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
-           hw_res.cc3, hw_res.cc6,hw_res.cc7);
+    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
+           hw_res.pc2,
+           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
+           hw_res.pc6, hw_res.pc7);
+    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
+        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
+               hw_res.pc8, hw_res.pc9, hw_res.pc10);
+    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
+           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
+           hw_res.cc6, hw_res.cc7);
 }
 
 static char* acpi_cstate_method_name[] =
@@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
     struct acpi_processor_power *power = processor_powers[cpuid];
     uint64_t idle_usage = 0, idle_res = 0;
     uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
-    int i;
-    struct hw_residencies hw_res;
+    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
 
     if ( power == NULL )
     {
         stat->last = 0;
         stat->nr = 0;
         stat->idle_time = 0;
+        stat->nr_pc = 0;
+        stat->nr_cc = 0;
         return 0;
     }
 
     stat->last = power->last_state ? power->last_state->idx : 0;
     stat->idle_time = get_cpu_idle_time(cpuid);
+    nr = min(stat->nr, power->count);
 
     /* mimic the stat when detail info hasn't been registered by dom0 */
     if ( pm_idle_save == NULL )
@@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
 
         usage[1] = idle_usage = 1;
         res[1] = idle_res = stat->idle_time;
-
-        memset(&hw_res, 0, sizeof(hw_res));
     }
     else
     {
+        struct hw_residencies hw_res;
+
         stat->nr = power->count;
 
-        for ( i = 1; i < power->count; i++ )
+        for ( i = 1; i < nr; i++ )
         {
             spin_lock_irq(&power->stat_lock);
             usage[i] = power->states[i].usage;
@@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
         }
 
         get_hw_residencies(cpuid, &hw_res);
+
+#define PUT_xC(what, n) do { \
+        if ( stat->nr_##what >= n && \
+             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
+            return -EFAULT; \
+        if ( hw_res.what##n ) \
+            nr_##what = n; \
+    } while ( 0 )
+#define PUT_PC(n) PUT_xC(pc, n)
+        PUT_PC(2);
+        PUT_PC(3);
+        PUT_PC(4);
+        PUT_PC(6);
+        PUT_PC(7);
+        PUT_PC(8);
+        PUT_PC(9);
+        PUT_PC(10);
+#undef PUT_PC
+#define PUT_CC(n) PUT_xC(cc, n)
+        PUT_CC(1);
+        PUT_CC(3);
+        PUT_CC(6);
+        PUT_CC(7);
+#undef PUT_CC
+#undef PUT_xC
     }
 
     usage[0] = idle_usage;
     res[0] = NOW() - idle_res;
 
-    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
-         copy_to_guest(stat->residencies, res, stat->nr) )
+    if ( copy_to_guest(stat->triggers, usage, nr) ||
+         copy_to_guest(stat->residencies, res, nr) )
         return -EFAULT;
 
-    stat->pc2 = hw_res.pc2;
-    stat->pc3 = hw_res.pc3;
-    stat->pc6 = hw_res.pc6;
-    stat->pc7 = hw_res.pc7;
-    stat->cc3 = hw_res.cc3;
-    stat->cc6 = hw_res.cc6;
-    stat->cc7 = hw_res.cc7;
+    stat->nr_pc = nr_pc;
+    stat->nr_cc = nr_cc;
 
     return 0;
 }
--- 2014-03-17.orig/xen/include/public/sysctl.h	2013-05-27 09:58:49.000000000 +0200
+++ 2014-03-17/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
 
 /*
  * Read console content from Xen buffer ring.
@@ -226,13 +226,10 @@ struct pm_cx_stat {
     uint64_aligned_t idle_time;                 /* idle time from boot */
     XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
     XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
-    uint64_aligned_t pc2;
-    uint64_aligned_t pc3;
-    uint64_aligned_t pc6;
-    uint64_aligned_t pc7;
-    uint64_aligned_t cc3;
-    uint64_aligned_t cc6;
-    uint64_aligned_t cc7;
+    uint32_t nr_pc;                          /* entry nr in pc[] */
+    uint32_t nr_cc;                          /* entry nr in cc[] */
+    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
+    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
 };
 
 struct xen_sysctl_get_pmstat {

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs
  2014-03-17 13:39   ` [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs Jan Beulich
@ 2014-03-17 13:43     ` Ian Campbell
  2014-03-17 15:48       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Ian Campbell @ 2014-03-17 13:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Jackson, Jun Nakajima,
	Donald D Dugger

On Mon, 2014-03-17 at 13:39 +0000, Jan Beulich wrote:
> With the number of these growing it becomes increasingly desirable to
> not repeatedly alter the sysctl interface to accommodate them. Replace
> the explicit listing of numbered states by arrays, unused fields of
> which will remain untouched by the hypercall.
> 
> The adjusted sysctl interface at once fixes an unrelated shortcoming
> of the original one: The "nr" field, specifying the size of the
> "triggers" and "residencies" arrays, has to be an input (along with
> being an output), which the previous implementation didn't obey to.
> 
> Note that the bouncing direction in the libxc interface at once gets
> corrected to OUT (was BOTH).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Keir Fraser <keir@xen.org>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

I assume you will apply.

> ---
> v2: Fully replace old interface and merge in previously separate patch
>     adjusting xenpm.c. Use calloc() for array allocations in xenpm.c.
> 
> --- 2014-03-17.orig/tools/libxc/xc_pm.c	2014-03-17 08:16:19.000000000 +0100
> +++ 2014-03-17/tools/libxc/xc_pm.c	2014-03-17 12:19:38.000000000 +0100
> @@ -123,45 +123,53 @@ int xc_pm_get_max_cx(xc_interface *xch, 
>  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
>  {
>      DECLARE_SYSCTL;
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies, 0, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -    int max_cx, ret;
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> +                                   cxpt->nr * sizeof(*cxpt->triggers),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies, cxpt->residencies,
> +                                   cxpt->nr * sizeof(*cxpt->residencies),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
> +                                   cxpt->nr_pc * sizeof(*cxpt->pc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
> +                                   cxpt->nr_cc * sizeof(*cxpt->cc),
> +                                   XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    int ret = -1;
>  
> -    if( !cxpt->triggers || !cxpt->residencies )
> -        return -EINVAL;
> -
> -    if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
> -        goto unlock_0;
> -
> -    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
> -    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx * sizeof(uint64_t));
> -
> -    ret = -1;
>      if ( xc_hypercall_bounce_pre(xch, triggers) )
>          goto unlock_0;
>      if ( xc_hypercall_bounce_pre(xch, residencies) )
>          goto unlock_1;
> +    if ( xc_hypercall_bounce_pre(xch, pc) )
> +        goto unlock_2;
> +    if ( xc_hypercall_bounce_pre(xch, cc) )
> +        goto unlock_3;
>  
>      sysctl.cmd = XEN_SYSCTL_get_pmstat;
>      sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
>      sysctl.u.get_pmstat.cpuid = cpuid;
> +    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
> +    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
> +    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
>      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies, residencies);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
> +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
>  
>      if ( (ret = xc_sysctl(xch, &sysctl)) )
> -        goto unlock_2;
> +        goto unlock_4;
>  
>      cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
>      cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
>      cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
> -    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
> -    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
> -    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
> -    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
> -    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
> -    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
> -    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
> +    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
> +    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
>  
> +unlock_4:
> +    xc_hypercall_bounce_post(xch, cc);
> +unlock_3:
> +    xc_hypercall_bounce_post(xch, pc);
>  unlock_2:
>      xc_hypercall_bounce_post(xch, residencies);
>  unlock_1:
> --- 2014-03-17.orig/tools/libxc/xenctrl.h	2014-01-14 13:33:16.000000000 +0100
> +++ 2014-03-17/tools/libxc/xenctrl.h	2014-03-17 12:20:20.000000000 +0100
> @@ -1935,18 +1935,15 @@ int xc_pm_get_pxstat(xc_interface *xch, 
>  int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
>  
>  struct xc_cx_stat {
> -    uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
> +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0 */
>      uint32_t last;         /* last Cx state */
>      uint64_t idle_time;    /* idle time from boot */
>      uint64_t *triggers;    /* Cx trigger counts */
>      uint64_t *residencies; /* Cx residencies */
> -    uint64_t pc2;
> -    uint64_t pc3;
> -    uint64_t pc6;
> -    uint64_t pc7;
> -    uint64_t cc3;
> -    uint64_t cc6;
> -    uint64_t cc7;
> +    uint32_t nr_pc;        /* entry nr in pc[] */
> +    uint32_t nr_cc;        /* entry nr in cc[] */
> +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
> +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
>  };
>  typedef struct xc_cx_stat xc_cx_stat_t;
>  
> --- 2014-03-17.orig/tools/misc/xenpm.c	2013-07-16 08:16:10.000000000 +0200
> +++ 2014-03-17/tools/misc/xenpm.c	2014-03-17 12:24:45.000000000 +0100
> @@ -29,6 +29,9 @@
>  #include <inttypes.h>
>  #include <sys/time.h>
>  
> +#define MAX_PKG_RESIDENCIES 12
> +#define MAX_CORE_RESIDENCIES 8
> +
>  #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
>  
>  static xc_interface *xc_handle;
> @@ -102,7 +105,7 @@ static void parse_cpuid_and_int(int argc
>  
>  static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
>  {
> -    int i;
> +    unsigned int i;
>  
>      printf("cpu id               : %d\n", cpuid);
>      printf("total C-states       : %d\n", cxstat->nr);
> @@ -115,17 +118,14 @@ static void print_cxstat(int cpuid, stru
>          printf("                       residency  [%20"PRIu64" ms]\n",
>                 cxstat->residencies[i]/1000000UL);
>      }
> -    printf("pc2                  : [%20"PRIu64" ms]\n"
> -           "pc3                  : [%20"PRIu64" ms]\n"
> -           "pc6                  : [%20"PRIu64" ms]\n"
> -           "pc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
> -            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
> -    printf("cc3                  : [%20"PRIu64" ms]\n"
> -           "cc6                  : [%20"PRIu64" ms]\n"
> -           "cc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
> -            cxstat->cc7/1000000UL);
> +    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
> +        if ( cxstat->pc[i] )
> +           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->pc[i] / 1000000UL);
> +    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
> +        if ( cxstat->cc[i] )
> +           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->cc[i] / 1000000UL);
>      printf("\n");
>  }
>  
> @@ -145,24 +145,36 @@ static int get_cxstat_by_cpuid(xc_interf
>      if ( !max_cx_num )
>          return -ENODEV;
>  
> -    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->triggers )
> -        return -ENOMEM;
> -    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->residencies )
> +    cxstat->triggers = calloc(max_cx_num, sizeof(*cxstat->triggers));
> +    cxstat->residencies = calloc(max_cx_num, sizeof(*cxstat->residencies));
> +    cxstat->pc = calloc(MAX_PKG_RESIDENCIES, sizeof(*cxstat->pc));
> +    cxstat->cc = calloc(MAX_CORE_RESIDENCIES, sizeof(*cxstat->cc));
> +    if ( !cxstat->triggers || !cxstat->residencies ||
> +         !cxstat->pc || !cxstat->cc )
>      {
> +        free(cxstat->cc);
> +        free(cxstat->pc);
> +        free(cxstat->residencies);
>          free(cxstat->triggers);
>          return -ENOMEM;
>      }
>  
> +    cxstat->nr = max_cx_num;
> +    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
> +    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
> +
>      ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
>      if( ret )
>      {
>          ret = -errno;
>          free(cxstat->triggers);
>          free(cxstat->residencies);
> +        free(cxstat->pc);
> +        free(cxstat->cc);
>          cxstat->triggers = NULL;
>          cxstat->residencies = NULL;
> +        cxstat->pc = NULL;
> +        cxstat->cc = NULL;
>      }
>  
>      return ret;
> @@ -198,6 +210,8 @@ static int show_cxstat_by_cpuid(xc_inter
>  
>      free(cxstatinfo.triggers);
>      free(cxstatinfo.residencies);
> +    free(cxstatinfo.pc);
> +    free(cxstatinfo.cc);
>      return 0;
>  }
>  
> @@ -482,25 +496,26 @@ static void signal_int_handler(int signo
>              /* print out CC? and PC? */
>              for ( i = 0; i < socket_nr; i++ )
>              {
> +                unsigned int n;
>                  uint64_t res;
> +
>                  for ( j = 0; j <= info.max_cpu_index; j++ )
>                  {
>                      if ( cpu_to_socket[j] == socket_ids[i] )
>                          break;
>                  }
>                  printf("\nSocket %d\n", socket_ids[i]);
> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                       100UL * res / (double)sum_cx[j]);
> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
> +                {
> +                    if ( n >= cxstat_end[j].nr_pc )
> +                        continue;
> +                    res = cxstat_end[j].pc[n];
> +                    if ( n < cxstat_start[j].nr_pc )
> +                        res -= cxstat_start[j].pc[n];
> +                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                           n + 1, res / 1000000UL,
> +                           100UL * res / (double)sum_cx[j]);
> +                }
>                  for ( k = 0; k < core_nr; k++ )
>                  {
>                      for ( j = 0; j <= info.max_cpu_index; j++ )
> @@ -510,15 +525,17 @@ static void signal_int_handler(int signo
>                              break;
>                      }
>                      printf("\t Core %d CPU %d\n", core_ids[k], j);
> -                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
> -                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
> -                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL, 
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
> -                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res / 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> +                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
> +                    {
> +                        if ( n >= cxstat_end[j].nr_cc )
> +                            continue;
> +                        res = cxstat_end[j].cc[n];
> +                        if ( n < cxstat_start[j].nr_cc )
> +                            res -= cxstat_start[j].cc[n];
> +                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                               n + 1, res / 1000000UL,
> +                               100UL * res / (double)sum_cx[j]);
> +                    }
>                  }
>              }
>          }
> @@ -529,6 +546,8 @@ static void signal_int_handler(int signo
>      {
>          free(cxstat[i].triggers);
>          free(cxstat[i].residencies);
> +        free(cxstat[i].pc);
> +        free(cxstat[i].cc);
>          free(pxstat[i].trans_pt);
>          free(pxstat[i].pt);
>      }
> --- 2014-03-17.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-05 09:52:16.000000000 +0100
> +++ 2014-03-17/xen/arch/x86/acpi/cpu_idle.c	2014-03-04 17:38:39.000000000 +0100
> @@ -62,13 +62,17 @@
>  
>  #define GET_HW_RES_IN_NS(msr, val) \
>      do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
> -#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only */
> +#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB onwards */
>  #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
>  #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
>  #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
> +#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some Haswells only */
> +#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some Haswells only */
> +#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some Haswells only */
> +#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /* Silvermont only */
>  #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
>  #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
> -#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only */
> +#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB onwards */
>  
>  static void lapic_timer_nop(void) { }
>  void (*__read_mostly lapic_timer_off)(void);
> @@ -111,8 +115,13 @@ struct hw_residencies
>  {
>      uint64_t pc2;
>      uint64_t pc3;
> +    uint64_t pc4;
>      uint64_t pc6;
>      uint64_t pc7;
> +    uint64_t pc8;
> +    uint64_t pc9;
> +    uint64_t pc10;
> +    uint64_t cc1;
>      uint64_t cc3;
>      uint64_t cc6;
>      uint64_t cc7;
> @@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
>  
>      switch ( c->x86_model )
>      {
> +    /* 4th generation Intel Core (Haswell) */
> +    case 0x45:
> +        GET_PC8_RES(hw_res->pc8);
> +        GET_PC9_RES(hw_res->pc9);
> +        GET_PC10_RES(hw_res->pc10);
> +        /* fall through */
>      /* Sandy bridge */
>      case 0x2A:
>      case 0x2D:
> @@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
>      /* Haswell */
>      case 0x3C:
>      case 0x3F:
> -    case 0x45:
>      case 0x46:
>      /* future */
>      case 0x3D:
> @@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
>          GET_CC3_RES(hw_res->cc3);
>          GET_CC6_RES(hw_res->cc6);
>          break;
> +    /* various Atoms */
> +    case 0x27:
> +        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
> +        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        break;
> +    /* Silvermont */
> +    case 0x37:
> +    case 0x4A:
> +    case 0x4D:
> +    case 0x5A:
> +    case 0x5D:
> +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> +        GET_CC1_RES(hw_res->cc1);
> +        GET_CC6_RES(hw_res->cc6);
> +        break;
>      }
>  }
>  
> @@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
>  
>      get_hw_residencies(cpu, &hw_res);
>  
> -    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"] PC7[%"PRId64"]\n",
> -           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
> -    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
> -           hw_res.cc3, hw_res.cc6,hw_res.cc7);
> +    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"] PC7[%"PRIu64"]\n",
> +           hw_res.pc2,
> +           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
> +           hw_res.pc6, hw_res.pc7);
> +    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
> +        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
> +               hw_res.pc8, hw_res.pc9, hw_res.pc10);
> +    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
> +           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
> +           hw_res.cc6, hw_res.cc7);
>  }
>  
>  static char* acpi_cstate_method_name[] =
> @@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>      struct acpi_processor_power *power = processor_powers[cpuid];
>      uint64_t idle_usage = 0, idle_res = 0;
>      uint64_t usage[ACPI_PROCESSOR_MAX_POWER], res[ACPI_PROCESSOR_MAX_POWER];
> -    int i;
> -    struct hw_residencies hw_res;
> +    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
>  
>      if ( power == NULL )
>      {
>          stat->last = 0;
>          stat->nr = 0;
>          stat->idle_time = 0;
> +        stat->nr_pc = 0;
> +        stat->nr_cc = 0;
>          return 0;
>      }
>  
>      stat->last = power->last_state ? power->last_state->idx : 0;
>      stat->idle_time = get_cpu_idle_time(cpuid);
> +    nr = min(stat->nr, power->count);
>  
>      /* mimic the stat when detail info hasn't been registered by dom0 */
>      if ( pm_idle_save == NULL )
> @@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>  
>          usage[1] = idle_usage = 1;
>          res[1] = idle_res = stat->idle_time;
> -
> -        memset(&hw_res, 0, sizeof(hw_res));
>      }
>      else
>      {
> +        struct hw_residencies hw_res;
> +
>          stat->nr = power->count;
>  
> -        for ( i = 1; i < power->count; i++ )
> +        for ( i = 1; i < nr; i++ )
>          {
>              spin_lock_irq(&power->stat_lock);
>              usage[i] = power->states[i].usage;
> @@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
>          }
>  
>          get_hw_residencies(cpuid, &hw_res);
> +
> +#define PUT_xC(what, n) do { \
> +        if ( stat->nr_##what >= n && \
> +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
> +            return -EFAULT; \
> +        if ( hw_res.what##n ) \
> +            nr_##what = n; \
> +    } while ( 0 )
> +#define PUT_PC(n) PUT_xC(pc, n)
> +        PUT_PC(2);
> +        PUT_PC(3);
> +        PUT_PC(4);
> +        PUT_PC(6);
> +        PUT_PC(7);
> +        PUT_PC(8);
> +        PUT_PC(9);
> +        PUT_PC(10);
> +#undef PUT_PC
> +#define PUT_CC(n) PUT_xC(cc, n)
> +        PUT_CC(1);
> +        PUT_CC(3);
> +        PUT_CC(6);
> +        PUT_CC(7);
> +#undef PUT_CC
> +#undef PUT_xC
>      }
>  
>      usage[0] = idle_usage;
>      res[0] = NOW() - idle_res;
>  
> -    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
> -         copy_to_guest(stat->residencies, res, stat->nr) )
> +    if ( copy_to_guest(stat->triggers, usage, nr) ||
> +         copy_to_guest(stat->residencies, res, nr) )
>          return -EFAULT;
>  
> -    stat->pc2 = hw_res.pc2;
> -    stat->pc3 = hw_res.pc3;
> -    stat->pc6 = hw_res.pc6;
> -    stat->pc7 = hw_res.pc7;
> -    stat->cc3 = hw_res.cc3;
> -    stat->cc6 = hw_res.cc6;
> -    stat->cc7 = hw_res.cc7;
> +    stat->nr_pc = nr_pc;
> +    stat->nr_cc = nr_cc;
>  
>      return 0;
>  }
> --- 2014-03-17.orig/xen/include/public/sysctl.h	2013-05-27 09:58:49.000000000 +0200
> +++ 2014-03-17/xen/include/public/sysctl.h	2014-03-04 17:34:15.000000000 +0100
> @@ -34,7 +34,7 @@
>  #include "xen.h"
>  #include "domctl.h"
>  
> -#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
> +#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
>  
>  /*
>   * Read console content from Xen buffer ring.
> @@ -226,13 +226,10 @@ struct pm_cx_stat {
>      uint64_aligned_t idle_time;                 /* idle time from boot */
>      XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
>      XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> -    uint64_aligned_t pc2;
> -    uint64_aligned_t pc3;
> -    uint64_aligned_t pc6;
> -    uint64_aligned_t pc7;
> -    uint64_aligned_t cc3;
> -    uint64_aligned_t cc6;
> -    uint64_aligned_t cc7;
> +    uint32_t nr_pc;                          /* entry nr in pc[] */
> +    uint32_t nr_cc;                          /* entry nr in cc[] */
> +    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing */
> +    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing */
>  };
>  
>  struct xen_sysctl_get_pmstat {
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs
  2014-03-17 13:43     ` Ian Campbell
@ 2014-03-17 15:48       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-17 15:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Keir Fraser, Ian Jackson, JunNakajima, Donald D Dugger

>>> On 17.03.14 at 14:43, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Mon, 2014-03-17 at 13:39 +0000, Jan Beulich wrote:
>> With the number of these growing it becomes increasingly desirable to
>> not repeatedly alter the sysctl interface to accommodate them. Replace
>> the explicit listing of numbered states by arrays, unused fields of
>> which will remain untouched by the hypercall.
>> 
>> The adjusted sysctl interface at once fixes an unrelated shortcoming
>> of the original one: The "nr" field, specifying the size of the
>> "triggers" and "residencies" arrays, has to be an input (along with
>> being an output), which the previous implementation didn't obey to.
>> 
>> Note that the bouncing direction in the libxc interface at once gets
>> corrected to OUT (was BOTH).
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Acked-by: Keir Fraser <keir@xen.org>
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> I assume you will apply.

Sure, but not without an Intel ack on this and the first patches.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] x86: support further Intel CPU families
  2014-03-05 10:36 ` [PATCH 1/3] " Jan Beulich
@ 2014-03-18  2:44   ` Tian, Kevin
  0 siblings, 0 replies; 31+ messages in thread
From: Tian, Kevin @ 2014-03-18  2:44 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Dugger, Donald D, Ian Jackson, Keir Fraser, Ian Campbell,
	Nakajima, Jun

> From: Jan Beulich
> Sent: Wednesday, March 05, 2014 6:37 PM
> 
> ... according to revision 49 of the Intel SDM.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Intel: Clarification is needed that I correctly resolved the ambiguity
> the manual has for 06_4D: Table 35-1 lists this among the Silvermont
> ones and uses 06_4E for Future Generation Intel Core; section 35.1 and
> table 35-24, however, use 06_4D throughout. My take is that the latter
> is what is wrong.

Just confirmed that 06_4D represents Silvermont, and 06_4E for future.

Acked-by: Kevin Tian <kevin.tian@intel.com>

> 
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -139,6 +139,9 @@ static void do_get_hw_residencies(void *
>      case 0x3F:
>      case 0x45:
>      case 0x46:
> +    /* future */
> +    case 0x3D:
> +    case 0x4E:
>          GET_PC2_RES(hw_res->pc2);
>          GET_CC7_RES(hw_res->cc7);
>          /* fall through */
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1966,10 +1966,14 @@ static const struct lbr_info *last_branc
>          case 58: case 62:
>          /* Haswell */
>          case 60: case 63: case 69: case 70:
> +        /* future */
> +        case 61: case 78:
>              return nh_lbr;
>              break;
>          /* Atom */
> -        case 28:
> +        case 28: case 38: case 39: case 53: case 54:
> +        /* Silvermont */
> +        case 55: case 74: case 77: case 90: case 93:
>              return at_lbr;
>              break;
>          }
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -916,6 +916,10 @@ int vmx_vpmu_initialise(struct vcpu *v,
>          case 0x3f:
>          case 0x45:
>          case 0x46:
> +
> +        /* future: */
> +        case 0x3d:
> +        case 0x4e:
>              ret = core2_vpmu_initialise(v, vpmu_flags);
>              if ( !ret )
>                  vpmu->arch_vpmu_ops = &core2_vpmu_ops;
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-05 10:42   ` Jan Beulich
@ 2014-03-18  2:44     ` Tian, Kevin
  0 siblings, 0 replies; 31+ messages in thread
From: Tian, Kevin @ 2014-03-18  2:44 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Dugger, Donald D, Ian Jackson, Keir Fraser, Ian Campbell,
	Nakajima, Jun

> From: Jan Beulich
> Sent: Wednesday, March 05, 2014 6:43 PM
> 
> >>> On 05.03.14 at 11:37, "Jan Beulich" <JBeulich@suse.com> wrote:
> > With the number of these growing it becomes increasingly desirable to
> > not repeatedly alter the sysctl interface to accommodate them. Replace
> > the explicit listing of numbered states by arrays, unused fields of
> > which will remain untouched by the hypercall.
> 
> Just added this to the description:
> 
> "The adjusted sysctl interface at once fixes an unrelated shortcoming
>  of the original one: The "nr" field, specifying the size of the
>  "triggers" and "residencies" arrays, has to be an input (along with
>  being an output), which the previous implementation didn't obey to."
> 
> Sorry for forgetting the first time through.
> 
> Jan
> 
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>

the idea looks OK to me, but I'm not expert in the tools side. If there's no more
concern from Ian, here is my ack:

Acked-by: Kevin Tian <kevin.tian@intel.com>

> >
> > --- 2014-02-13.orig/tools/libxc/xc_pm.c	2014-03-04 17:43:06.000000000
> +0100
> > +++ 2014-02-13/tools/libxc/xc_pm.c	2014-03-05 08:54:58.000000000 +0100
> > @@ -123,46 +123,90 @@ int xc_pm_get_max_cx(xc_interface *xch,
> >
> >  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat *cxpt)
> >  {
> > -    DECLARE_SYSCTL;
> > -    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers, 0,
> > XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> > -    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies,
> cxpt->residencies, 0,
> > XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> > +    uint64_t pc[7], cc[7];
> > +    struct xc_cx_stat_v2 cxpt2 = {
> > +        .triggers = cxpt->triggers,
> > +        .residencies = cxpt->residencies,
> > +        .nr_pc = sizeof(pc) / sizeof(*pc),
> > +        .nr_cc = sizeof(cc) / sizeof(*cc),
> > +        .pc = pc,
> > +        .cc = cc
> > +    };
> >      int max_cx, ret;
> >
> >      if( !cxpt->triggers || !cxpt->residencies )
> >          return -EINVAL;
> >
> >      if ( (ret = xc_pm_get_max_cx(xch, cpuid, &max_cx)) )
> > -        goto unlock_0;
> > +        return ret;
> >
> > -    HYPERCALL_BOUNCE_SET_SIZE(triggers, max_cx * sizeof(uint64_t));
> > -    HYPERCALL_BOUNCE_SET_SIZE(residencies, max_cx *
> sizeof(uint64_t));
> > +    cxpt2.nr = max_cx;
> > +    ret = xc_pm_get_cx_stat(xch, cpuid, &cxpt2);
> > +
> > +    cxpt->nr = cxpt2.nr;
> > +    cxpt->last = cxpt2.last;
> > +    cxpt->idle_time = cxpt2.idle_time;
> > +    cxpt->pc2 = pc[1];
> > +    cxpt->pc3 = pc[2];
> > +    cxpt->pc6 = pc[5];
> > +    cxpt->pc7 = pc[6];
> > +    cxpt->cc3 = cc[2];
> > +    cxpt->cc6 = cc[5];
> > +    cxpt->cc7 = cc[6];
> > +
> > +    return ret;
> > +}
> > +
> > +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2
> > *cxpt)
> > +{
> > +    DECLARE_SYSCTL;
> > +    DECLARE_NAMED_HYPERCALL_BOUNCE(triggers, cxpt->triggers,
> > +                                   cxpt->nr *
> sizeof(*cxpt->triggers),
> > +
> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > +    DECLARE_NAMED_HYPERCALL_BOUNCE(residencies,
> cxpt->residencies,
> > +                                   cxpt->nr *
> sizeof(*cxpt->residencies),
> > +
> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > +    DECLARE_NAMED_HYPERCALL_BOUNCE(pc, cxpt->pc,
> > +                                   cxpt->nr_pc * sizeof(*cxpt->pc),
> > +
> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > +    DECLARE_NAMED_HYPERCALL_BOUNCE(cc, cxpt->cc,
> > +                                   cxpt->nr_cc * sizeof(*cxpt->cc),
> > +
> XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > +    int ret = -1;
> >
> > -    ret = -1;
> >      if ( xc_hypercall_bounce_pre(xch, triggers) )
> >          goto unlock_0;
> >      if ( xc_hypercall_bounce_pre(xch, residencies) )
> >          goto unlock_1;
> > +    if ( xc_hypercall_bounce_pre(xch, pc) )
> > +        goto unlock_2;
> > +    if ( xc_hypercall_bounce_pre(xch, cc) )
> > +        goto unlock_3;
> >
> >      sysctl.cmd = XEN_SYSCTL_get_pmstat;
> >      sysctl.u.get_pmstat.type = PMSTAT_get_cxstat;
> >      sysctl.u.get_pmstat.cpuid = cpuid;
> > +    sysctl.u.get_pmstat.u.getcx.nr = cxpt->nr;
> > +    sysctl.u.get_pmstat.u.getcx.nr_pc = cxpt->nr_pc;
> > +    sysctl.u.get_pmstat.u.getcx.nr_cc = cxpt->nr_cc;
> >      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.triggers, triggers);
> >      set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.residencies,
> > residencies);
> > +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.pc, pc);
> > +    set_xen_guest_handle(sysctl.u.get_pmstat.u.getcx.cc, cc);
> >
> >      if ( (ret = xc_sysctl(xch, &sysctl)) )
> > -        goto unlock_2;
> > +        goto unlock_4;
> >
> >      cxpt->nr = sysctl.u.get_pmstat.u.getcx.nr;
> >      cxpt->last = sysctl.u.get_pmstat.u.getcx.last;
> >      cxpt->idle_time = sysctl.u.get_pmstat.u.getcx.idle_time;
> > -    cxpt->pc2 = sysctl.u.get_pmstat.u.getcx.pc2;
> > -    cxpt->pc3 = sysctl.u.get_pmstat.u.getcx.pc3;
> > -    cxpt->pc6 = sysctl.u.get_pmstat.u.getcx.pc6;
> > -    cxpt->pc7 = sysctl.u.get_pmstat.u.getcx.pc7;
> > -    cxpt->cc3 = sysctl.u.get_pmstat.u.getcx.cc3;
> > -    cxpt->cc6 = sysctl.u.get_pmstat.u.getcx.cc6;
> > -    cxpt->cc7 = sysctl.u.get_pmstat.u.getcx.cc7;
> > +    cxpt->nr_pc = sysctl.u.get_pmstat.u.getcx.nr_pc;
> > +    cxpt->nr_cc = sysctl.u.get_pmstat.u.getcx.nr_cc;
> >
> > +unlock_4:
> > +    xc_hypercall_bounce_post(xch, cc);
> > +unlock_3:
> > +    xc_hypercall_bounce_post(xch, pc);
> >  unlock_2:
> >      xc_hypercall_bounce_post(xch, residencies);
> >  unlock_1:
> > --- 2014-02-13.orig/tools/libxc/xenctrl.h	2014-03-04 17:43:06.000000000
> +0100
> > +++ 2014-02-13/tools/libxc/xenctrl.h	2014-03-04 17:50:49.000000000 +0100
> > @@ -1934,7 +1934,7 @@ int xc_pm_get_max_px(xc_interface *xch,
> >  int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct xc_px_stat
> > *pxpt);
> >  int xc_pm_reset_pxstat(xc_interface *xch, int cpuid);
> >
> > -struct xc_cx_stat {
> > +struct xc_cx_stat { /* DEPRECATED (use v2 below instead)! */
> >      uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
> >      uint32_t last;         /* last Cx state */
> >      uint64_t idle_time;    /* idle time from boot */
> > @@ -1950,8 +1950,22 @@ struct xc_cx_stat {
> >  };
> >  typedef struct xc_cx_stat xc_cx_stat_t;
> >
> > +struct xc_cx_stat_v2 {
> > +    uint32_t nr;           /* entry nr in triggers[]/residencies[], incl C0
> > */
> > +    uint32_t last;         /* last Cx state */
> > +    uint64_t idle_time;    /* idle time from boot */
> > +    uint64_t *triggers;    /* Cx trigger counts */
> > +    uint64_t *residencies; /* Cx residencies */
> > +    uint32_t nr_pc;        /* entry nr in pc[] */
> > +    uint32_t nr_cc;        /* entry nr in cc[] */
> > +    uint64_t *pc;          /* 1-biased indexing (i.e. excl C0) */
> > +    uint64_t *cc;          /* 1-biased indexing (i.e. excl C0) */
> > +};
> > +typedef struct xc_cx_stat_v2 xc_cx_stat_v2_t;
> > +
> >  int xc_pm_get_max_cx(xc_interface *xch, int cpuid, int *max_cx);
> >  int xc_pm_get_cxstat(xc_interface *xch, int cpuid, struct xc_cx_stat
> > *cxpt);
> > +int xc_pm_get_cx_stat(xc_interface *xch, int cpuid, struct xc_cx_stat_v2
> > *);
> >  int xc_pm_reset_cxstat(xc_interface *xch, int cpuid);
> >
> >  int xc_cpu_online(xc_interface *xch, int cpu);
> > --- 2014-02-13.orig/xen/arch/x86/acpi/cpu_idle.c	2014-03-04
> 17:43:06.000000000 +0100
> > +++ 2014-02-13/xen/arch/x86/acpi/cpu_idle.c	2014-03-04
> 17:38:39.000000000 +0100
> > @@ -62,13 +62,17 @@
> >
> >  #define GET_HW_RES_IN_NS(msr, val) \
> >      do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
> > -#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB only
> */
> > +#define GET_PC2_RES(val)  GET_HW_RES_IN_NS(0x60D, val) /* SNB
> onwards */
> >  #define GET_PC3_RES(val)  GET_HW_RES_IN_NS(0x3F8, val)
> >  #define GET_PC6_RES(val)  GET_HW_RES_IN_NS(0x3F9, val)
> >  #define GET_PC7_RES(val)  GET_HW_RES_IN_NS(0x3FA, val)
> > +#define GET_PC8_RES(val)  GET_HW_RES_IN_NS(0x630, val) /* some
> Haswells
> > only */
> > +#define GET_PC9_RES(val)  GET_HW_RES_IN_NS(0x631, val) /* some
> Haswells
> > only */
> > +#define GET_PC10_RES(val) GET_HW_RES_IN_NS(0x632, val) /* some
> Haswells
> > only */
> > +#define GET_CC1_RES(val)  GET_HW_RES_IN_NS(0x660, val) /*
> Silvermont only
> > */
> >  #define GET_CC3_RES(val)  GET_HW_RES_IN_NS(0x3FC, val)
> >  #define GET_CC6_RES(val)  GET_HW_RES_IN_NS(0x3FD, val)
> > -#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB only
> */
> > +#define GET_CC7_RES(val)  GET_HW_RES_IN_NS(0x3FE, val) /* SNB
> onwards */
> >
> >  static void lapic_timer_nop(void) { }
> >  void (*__read_mostly lapic_timer_off)(void);
> > @@ -111,8 +115,13 @@ struct hw_residencies
> >  {
> >      uint64_t pc2;
> >      uint64_t pc3;
> > +    uint64_t pc4;
> >      uint64_t pc6;
> >      uint64_t pc7;
> > +    uint64_t pc8;
> > +    uint64_t pc9;
> > +    uint64_t pc10;
> > +    uint64_t cc1;
> >      uint64_t cc3;
> >      uint64_t cc6;
> >      uint64_t cc7;
> > @@ -128,6 +137,12 @@ static void do_get_hw_residencies(void *
> >
> >      switch ( c->x86_model )
> >      {
> > +    /* 4th generation Intel Core (Haswell) */
> > +    case 0x45:
> > +        GET_PC8_RES(hw_res->pc8);
> > +        GET_PC9_RES(hw_res->pc9);
> > +        GET_PC10_RES(hw_res->pc10);
> > +        /* fall through */
> >      /* Sandy bridge */
> >      case 0x2A:
> >      case 0x2D:
> > @@ -137,7 +152,6 @@ static void do_get_hw_residencies(void *
> >      /* Haswell */
> >      case 0x3C:
> >      case 0x3F:
> > -    case 0x45:
> >      case 0x46:
> >      /* future */
> >      case 0x3D:
> > @@ -160,6 +174,22 @@ static void do_get_hw_residencies(void *
> >          GET_CC3_RES(hw_res->cc3);
> >          GET_CC6_RES(hw_res->cc6);
> >          break;
> > +    /* various Atoms */
> > +    case 0x27:
> > +        GET_PC3_RES(hw_res->pc2); /* abusing GET_PC3_RES */
> > +        GET_PC6_RES(hw_res->pc4); /* abusing GET_PC6_RES */
> > +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> > +        break;
> > +    /* Silvermont */
> > +    case 0x37:
> > +    case 0x4A:
> > +    case 0x4D:
> > +    case 0x5A:
> > +    case 0x5D:
> > +        GET_PC7_RES(hw_res->pc6); /* abusing GET_PC7_RES */
> > +        GET_CC1_RES(hw_res->cc1);
> > +        GET_CC6_RES(hw_res->cc6);
> > +        break;
> >      }
> >  }
> >
> > @@ -179,10 +209,16 @@ static void print_hw_residencies(uint32_
> >
> >      get_hw_residencies(cpu, &hw_res);
> >
> > -    printk("PC2[%"PRId64"] PC3[%"PRId64"] PC6[%"PRId64"]
> PC7[%"PRId64"]\n",
> > -           hw_res.pc2, hw_res.pc3, hw_res.pc6, hw_res.pc7);
> > -    printk("CC3[%"PRId64"] CC6[%"PRId64"] CC7[%"PRId64"]\n",
> > -           hw_res.cc3, hw_res.cc6,hw_res.cc7);
> > +    printk("PC2[%"PRIu64"] PC%d[%"PRIu64"] PC6[%"PRIu64"]
> > PC7[%"PRIu64"]\n",
> > +           hw_res.pc2,
> > +           hw_res.pc4 ? 4 : 3, hw_res.pc4 ?: hw_res.pc3,
> > +           hw_res.pc6, hw_res.pc7);
> > +    if ( hw_res.pc8 | hw_res.pc9 | hw_res.pc10 )
> > +        printk("PC8[%"PRIu64"] PC9[%"PRIu64"] PC10[%"PRIu64"]\n",
> > +               hw_res.pc8, hw_res.pc9, hw_res.pc10);
> > +    printk("CC%d[%"PRIu64"] CC6[%"PRIu64"] CC7[%"PRIu64"]\n",
> > +           hw_res.cc1 ? 1 : 3, hw_res.cc1 ?: hw_res.cc3,
> > +           hw_res.cc6, hw_res.cc7);
> >  }
> >
> >  static char* acpi_cstate_method_name[] =
> > @@ -1097,19 +1133,21 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
> >      struct acpi_processor_power *power = processor_powers[cpuid];
> >      uint64_t idle_usage = 0, idle_res = 0;
> >      uint64_t usage[ACPI_PROCESSOR_MAX_POWER],
> > res[ACPI_PROCESSOR_MAX_POWER];
> > -    int i;
> > -    struct hw_residencies hw_res;
> > +    unsigned int i, nr, nr_pc = 0, nr_cc = 0;
> >
> >      if ( power == NULL )
> >      {
> >          stat->last = 0;
> >          stat->nr = 0;
> >          stat->idle_time = 0;
> > +        stat->nr_pc = 0;
> > +        stat->nr_cc = 0;
> >          return 0;
> >      }
> >
> >      stat->last = power->last_state ? power->last_state->idx : 0;
> >      stat->idle_time = get_cpu_idle_time(cpuid);
> > +    nr = min(stat->nr, power->count);
> >
> >      /* mimic the stat when detail info hasn't been registered by dom0 */
> >      if ( pm_idle_save == NULL )
> > @@ -1118,14 +1156,14 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
> >
> >          usage[1] = idle_usage = 1;
> >          res[1] = idle_res = stat->idle_time;
> > -
> > -        memset(&hw_res, 0, sizeof(hw_res));
> >      }
> >      else
> >      {
> > +        struct hw_residencies hw_res;
> > +
> >          stat->nr = power->count;
> >
> > -        for ( i = 1; i < power->count; i++ )
> > +        for ( i = 1; i < nr; i++ )
> >          {
> >              spin_lock_irq(&power->stat_lock);
> >              usage[i] = power->states[i].usage;
> > @@ -1137,22 +1175,42 @@ int pmstat_get_cx_stat(uint32_t cpuid, s
> >          }
> >
> >          get_hw_residencies(cpuid, &hw_res);
> > +
> > +#define PUT_xC(what, n) do { \
> > +        if ( stat->nr_##what >= n && \
> > +             copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n,
> 1) ) \
> > +            return -EFAULT; \
> > +        if ( hw_res.what##n ) \
> > +            nr_##what = n; \
> > +    } while ( 0 )
> > +#define PUT_PC(n) PUT_xC(pc, n)
> > +        PUT_PC(2);
> > +        PUT_PC(3);
> > +        PUT_PC(4);
> > +        PUT_PC(6);
> > +        PUT_PC(7);
> > +        PUT_PC(8);
> > +        PUT_PC(9);
> > +        PUT_PC(10);
> > +#undef PUT_PC
> > +#define PUT_CC(n) PUT_xC(cc, n)
> > +        PUT_CC(1);
> > +        PUT_CC(3);
> > +        PUT_CC(6);
> > +        PUT_CC(7);
> > +#undef PUT_CC
> > +#undef PUT_xC
> >      }
> >
> >      usage[0] = idle_usage;
> >      res[0] = NOW() - idle_res;
> >
> > -    if ( copy_to_guest(stat->triggers, usage, stat->nr) ||
> > -         copy_to_guest(stat->residencies, res, stat->nr) )
> > +    if ( copy_to_guest(stat->triggers, usage, nr) ||
> > +         copy_to_guest(stat->residencies, res, nr) )
> >          return -EFAULT;
> >
> > -    stat->pc2 = hw_res.pc2;
> > -    stat->pc3 = hw_res.pc3;
> > -    stat->pc6 = hw_res.pc6;
> > -    stat->pc7 = hw_res.pc7;
> > -    stat->cc3 = hw_res.cc3;
> > -    stat->cc6 = hw_res.cc6;
> > -    stat->cc7 = hw_res.cc7;
> > +    stat->nr_pc = nr_pc;
> > +    stat->nr_cc = nr_cc;
> >
> >      return 0;
> >  }
> > --- 2014-02-13.orig/xen/include/public/sysctl.h	2014-03-04
> 17:43:06.000000000 +0100
> > +++ 2014-02-13/xen/include/public/sysctl.h	2014-03-04
> 17:34:15.000000000 +0100
> > @@ -34,7 +34,7 @@
> >  #include "xen.h"
> >  #include "domctl.h"
> >
> > -#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000A
> > +#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
> >
> >  /*
> >   * Read console content from Xen buffer ring.
> > @@ -226,13 +226,10 @@ struct pm_cx_stat {
> >      uint64_aligned_t idle_time;                 /* idle time from
> boot */
> >      XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
> >      XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> > -    uint64_aligned_t pc2;
> > -    uint64_aligned_t pc3;
> > -    uint64_aligned_t pc6;
> > -    uint64_aligned_t pc7;
> > -    uint64_aligned_t cc3;
> > -    uint64_aligned_t cc6;
> > -    uint64_aligned_t cc7;
> > +    uint32_t nr_pc;                          /* entry nr in pc[] */
> > +    uint32_t nr_cc;                          /* entry nr in cc[] */
> > +    XEN_GUEST_HANDLE_64(uint64) pc;          /* 1-biased indexing
> */
> > +    XEN_GUEST_HANDLE_64(uint64) cc;          /* 1-biased indexing
> */
> >  };
> >
> >  struct xen_sysctl_get_pmstat {
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] xenpm: use new Cx statistics interface
  2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
  2014-03-05 15:47   ` Boris Ostrovsky
  2014-03-13 14:12   ` Ian Campbell
@ 2014-03-18  2:45   ` Tian, Kevin
  2 siblings, 0 replies; 31+ messages in thread
From: Tian, Kevin @ 2014-03-18  2:45 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Dugger, Donald D, Ian Jackson, Keir Fraser, Ian Campbell,
	Nakajima, Jun

> From: Jan Beulich
> Sent: Wednesday, March 05, 2014 6:38 PM
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> 
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -29,6 +29,9 @@
>  #include <inttypes.h>
>  #include <sys/time.h>
> 
> +#define MAX_PKG_RESIDENCIES 12
> +#define MAX_CORE_RESIDENCIES 8
> +
>  #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
> 
>  static xc_interface *xc_handle;
> @@ -100,9 +103,9 @@ static void parse_cpuid_and_int(int argc
>      }
>  }
> 
> -static void print_cxstat(int cpuid, struct xc_cx_stat *cxstat)
> +static void print_cxstat(int cpuid, const struct xc_cx_stat_v2 *cxstat)
>  {
> -    int i;
> +    unsigned int i;
> 
>      printf("cpu id               : %d\n", cpuid);
>      printf("total C-states       : %d\n", cxstat->nr);
> @@ -115,22 +118,20 @@ static void print_cxstat(int cpuid, stru
>          printf("                       residency  [%20"PRIu64"
> ms]\n",
>                 cxstat->residencies[i]/1000000UL);
>      }
> -    printf("pc2                  : [%20"PRIu64" ms]\n"
> -           "pc3                  : [%20"PRIu64" ms]\n"
> -           "pc6                  : [%20"PRIu64" ms]\n"
> -           "pc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->pc2/1000000UL, cxstat->pc3/1000000UL,
> -            cxstat->pc6/1000000UL, cxstat->pc7/1000000UL);
> -    printf("cc3                  : [%20"PRIu64" ms]\n"
> -           "cc6                  : [%20"PRIu64" ms]\n"
> -           "cc7                  : [%20"PRIu64" ms]\n",
> -            cxstat->cc3/1000000UL, cxstat->cc6/1000000UL,
> -            cxstat->cc7/1000000UL);
> +    for ( i = 0; i < MAX_PKG_RESIDENCIES && i < cxstat->nr_pc; ++i )
> +        if ( cxstat->pc[i] )
> +           printf("pc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->pc[i] / 1000000UL);
> +    for ( i = 0; i < MAX_CORE_RESIDENCIES && i < cxstat->nr_cc; ++i )
> +        if ( cxstat->cc[i] )
> +           printf("cc%d                  : [%20"PRIu64" ms]\n", i + 1,
> +                  cxstat->cc[i] / 1000000UL);
>      printf("\n");
>  }
> 
>  /* show cpu idle information on CPU cpuid */
> -static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid, struct
> xc_cx_stat *cxstat)
> +static int get_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid,
> +                               struct xc_cx_stat_v2 *cxstat)
>  {
>      int ret = 0;
>      int max_cx_num = 0;
> @@ -145,24 +146,36 @@ static int get_cxstat_by_cpuid(xc_interf
>      if ( !max_cx_num )
>          return -ENODEV;
> 
> -    cxstat->triggers = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->triggers )
> -        return -ENOMEM;
> -    cxstat->residencies = malloc(max_cx_num * sizeof(uint64_t));
> -    if ( !cxstat->residencies )
> +    cxstat->triggers = malloc(max_cx_num * sizeof(*cxstat->triggers));
> +    cxstat->residencies = malloc(max_cx_num *
> sizeof(*cxstat->residencies));
> +    cxstat->pc = malloc(MAX_PKG_RESIDENCIES * sizeof(*cxstat->pc));
> +    cxstat->cc = malloc(MAX_CORE_RESIDENCIES * sizeof(*cxstat->cc));
> +    if ( !cxstat->triggers || !cxstat->residencies ||
> +         !cxstat->pc || !cxstat->cc )
>      {
> +        free(cxstat->cc);
> +        free(cxstat->pc);
> +        free(cxstat->residencies);
>          free(cxstat->triggers);
>          return -ENOMEM;
>      }
> 
> -    ret = xc_pm_get_cxstat(xc_handle, cpuid, cxstat);
> +    cxstat->nr = max_cx_num;
> +    cxstat->nr_pc = MAX_PKG_RESIDENCIES;
> +    cxstat->nr_cc = MAX_CORE_RESIDENCIES;
> +
> +    ret = xc_pm_get_cx_stat(xc_handle, cpuid, cxstat);
>      if( ret )
>      {
>          ret = -errno;
>          free(cxstat->triggers);
>          free(cxstat->residencies);
> +        free(cxstat->pc);
> +        free(cxstat->cc);
>          cxstat->triggers = NULL;
>          cxstat->residencies = NULL;
> +        cxstat->pc = NULL;
> +        cxstat->cc = NULL;
>      }
> 
>      return ret;
> @@ -183,7 +196,7 @@ static int show_max_cstate(xc_interface
>  static int show_cxstat_by_cpuid(xc_interface *xc_handle, int cpuid)
>  {
>      int ret = 0;
> -    struct xc_cx_stat cxstatinfo;
> +    struct xc_cx_stat_v2 cxstatinfo;
> 
>      ret = get_cxstat_by_cpuid(xc_handle, cpuid, &cxstatinfo);
>      if ( ret )
> @@ -198,6 +211,8 @@ static int show_cxstat_by_cpuid(xc_inter
> 
>      free(cxstatinfo.triggers);
>      free(cxstatinfo.residencies);
> +    free(cxstatinfo.pc);
> +    free(cxstatinfo.cc);
>      return 0;
>  }
> 
> @@ -331,7 +346,7 @@ void pxstat_func(int argc, char *argv[])
>  }
> 
>  static uint64_t usec_start, usec_end;
> -static struct xc_cx_stat *cxstat, *cxstat_start, *cxstat_end;
> +static struct xc_cx_stat_v2 *cxstat, *cxstat_start, *cxstat_end;
>  static struct xc_px_stat *pxstat, *pxstat_start, *pxstat_end;
>  static int *avgfreq;
>  static uint64_t *sum, *sum_cx, *sum_px;
> @@ -482,25 +497,26 @@ static void signal_int_handler(int signo
>              /* print out CC? and PC? */
>              for ( i = 0; i < socket_nr; i++ )
>              {
> +                unsigned int n;
>                  uint64_t res;
> +
>                  for ( j = 0; j <= info.max_cpu_index; j++ )
>                  {
>                      if ( cpu_to_socket[j] == socket_ids[i] )
>                          break;
>                  }
>                  printf("\nSocket %d\n", socket_ids[i]);
> -                res = cxstat_end[j].pc2 - cxstat_start[j].pc2;
> -                printf("\tPC2\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc3 - cxstat_start[j].pc3;
> -                printf("\tPC3\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc6 - cxstat_start[j].pc6;
> -                printf("\tPC6\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> -                res = cxstat_end[j].pc7 - cxstat_start[j].pc7;
> -                printf("\tPC7\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                       100UL * res / (double)sum_cx[j]);
> +                for ( n = 0; n < MAX_PKG_RESIDENCIES; ++n )
> +                {
> +                    if ( n >= cxstat_end[j].nr_pc )
> +                        continue;
> +                    res = cxstat_end[j].pc[n];
> +                    if ( n < cxstat_start[j].nr_pc )
> +                        res -= cxstat_start[j].pc[n];
> +                    printf("\tPC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                           n + 1, res / 1000000UL,
> +                           100UL * res / (double)sum_cx[j]);
> +                }
>                  for ( k = 0; k < core_nr; k++ )
>                  {
>                      for ( j = 0; j <= info.max_cpu_index; j++ )
> @@ -510,15 +526,17 @@ static void signal_int_handler(int signo
>                              break;
>                      }
>                      printf("\t Core %d CPU %d\n", core_ids[k], j);
> -                    res = cxstat_end[j].cc3 - cxstat_start[j].cc3;
> -                    printf("\t\tCC3\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc6 - cxstat_start[j].cc6;
> -                    printf("\t\tCC6\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> -                    res = cxstat_end[j].cc7 - cxstat_start[j].cc7;
> -                    printf("\t\tCC7\t%"PRIu64" ms\t%.2f%%\n",  res /
> 1000000UL,
> -                           100UL * res / (double)sum_cx[j]);
> +                    for ( n = 0; n < MAX_CORE_RESIDENCIES; ++n )
> +                    {
> +                        if ( n >= cxstat_end[j].nr_cc )
> +                            continue;
> +                        res = cxstat_end[j].cc[n];
> +                        if ( n < cxstat_start[j].nr_cc )
> +                            res -= cxstat_start[j].cc[n];
> +                        printf("\t\tCC%u\t%"PRIu64" ms\t%.2f%%\n",
> +                               n + 1, res / 1000000UL,
> +                               100UL * res / (double)sum_cx[j]);
> +                    }
>                  }
>              }
>          }
> @@ -529,6 +547,8 @@ static void signal_int_handler(int signo
>      {
>          free(cxstat[i].triggers);
>          free(cxstat[i].residencies);
> +        free(cxstat[i].pc);
> +        free(cxstat[i].cc);
>          free(pxstat[i].trans_pt);
>          free(pxstat[i].pt);
>      }
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-13 14:11   ` Ian Campbell
  2014-03-13 14:27     ` Jan Beulich
@ 2014-03-18 16:18     ` Ian Jackson
  2014-03-18 16:25       ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Ian Jackson @ 2014-03-18 16:18 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Jun Nakajima, xen-devel, Keir Fraser, Donald D Dugger,
	Jan Beulich

Ian Campbell writes ("Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs"):
> On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
> > unused fields of which will remain untouched by the hypercall.
> 
> Are you supposed to initialise them to some known sentinal or are the
> valid entries identified somewhere else (sorry, don't know much about
> x86 pm).

I see Jan has answered this.  But I think the answer should be in the
hypercall interface documentation, in the Xen public headers.

Would it be too much to ask that this hypercall be properly documented
as part of these changes ?

Ian.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/3] x86/idle: update to include further package/core residency MSRs
  2014-03-18 16:18     ` Ian Jackson
@ 2014-03-18 16:25       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-03-18 16:25 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson
  Cc: xen-devel, KeirFraser, Jun Nakajima, Donald D Dugger

>>> On 18.03.14 at 17:18, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Ian Campbell writes ("Re: [PATCH 2/3] x86/idle: update to include further 
> package/core residency MSRs"):
>> On Wed, 2014-03-05 at 10:37 +0000, Jan Beulich wrote:
>> > unused fields of which will remain untouched by the hypercall.
>> 
>> Are you supposed to initialise them to some known sentinal or are the
>> valid entries identified somewhere else (sorry, don't know much about
>> x86 pm).
> 
> I see Jan has answered this.  But I think the answer should be in the
> hypercall interface documentation, in the Xen public headers.
> 
> Would it be too much to ask that this hypercall be properly documented
> as part of these changes ?

I can certainly add a line saying so.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-03-18 16:25 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-05 10:34 [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
2014-03-05 10:36 ` [PATCH 1/3] " Jan Beulich
2014-03-18  2:44   ` Tian, Kevin
2014-03-05 10:37 ` [PATCH 2/3] x86/idle: update to include further package/core residency MSRs Jan Beulich
2014-03-05 10:42   ` Jan Beulich
2014-03-18  2:44     ` Tian, Kevin
2014-03-05 15:07   ` Boris Ostrovsky
2014-03-05 15:15     ` Jan Beulich
2014-03-05 15:30       ` Boris Ostrovsky
2014-03-13 14:11   ` Ian Campbell
2014-03-13 14:27     ` Jan Beulich
2014-03-13 15:34       ` Ian Campbell
2014-03-13 15:48         ` Jan Beulich
2014-03-13 15:53           ` Ian Campbell
2014-03-18 16:18     ` Ian Jackson
2014-03-18 16:25       ` Jan Beulich
2014-03-13 14:28   ` Keir Fraser
2014-03-05 10:37 ` [PATCH 3/3] xenpm: use new Cx statistics interface Jan Beulich
2014-03-05 15:47   ` Boris Ostrovsky
2014-03-05 15:53     ` Jan Beulich
2014-03-05 17:05       ` Boris Ostrovsky
2014-03-06  9:37         ` Jan Beulich
2014-03-13 14:12   ` Ian Campbell
2014-03-18  2:45   ` Tian, Kevin
2014-03-12  9:38 ` Ping: [PATCH 0/3] x86: support further Intel CPU families Jan Beulich
2014-03-12 10:18   ` Ian Campbell
2014-03-17 13:28 ` [PATCH v2 0/2] " Jan Beulich
2014-03-17 13:38   ` [PATCH v2 1/2] x86: Intel CPU family update Jan Beulich
2014-03-17 13:39   ` [PATCH v2 2/2] x86/idle: update to include further package/core residency MSRs Jan Beulich
2014-03-17 13:43     ` Ian Campbell
2014-03-17 15:48       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).