* Core parking feature enable
@ 2012-02-17 8:54 Liu, Jinsong
2012-02-17 9:41 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-02-17 8:54 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Jan Beulich, keir.xen@gmail.com,
xen-devel@lists.xensource.com
Cc: Li, Shaohua
Core parking is a power control feature and it can co-work with NPTM to control system power budget through online/offline some CPUs in the system. These patches implement core parking feature for xen. They consist of 2 parts: dom0 patches and xen hypervisor patches.
At dom0 side, patches include
[Patch 1/3] intercept native pad (Processor Aggregator Device) logic, providing a native interface for natvie platform and a paravirt template for paravirt platform, so that os can implicitly hook to proper ops accordingly;
[Patch 2/3] redirect paravirt template to Xen pv ops;
[Patch 3/3] implement Xen pad logic, and when getting pad device notification, it hypercalls to Xen hypervisor for core parking. Due to the characteristic of xen continue_hypercall_on_cpu, dom0 seperately send/get core parking request/result;
At Xen hypervisor side, patches include
[Patch 1/2] implement hypercall through which dom0 send core parking request, and get core parking result;
[Patch 2/2] implement Xen core parking. Different core parking sequence has different power/performance result, due to cpu socket/core/thread topology. This patch provide power-first and performance-first policies, users can choose core parking policy on their own demand, considering power and performance tradeoff.
cc Shaohua, core parking feature author at linux kernel side.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-17 8:54 Core parking feature enable Liu, Jinsong
@ 2012-02-17 9:41 ` Jan Beulich
2012-02-17 17:48 ` Liu, Jinsong
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-02-17 9:41 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, keir.xen@gmail.com, Konrad Rzeszutek Wilk, xen-devel
>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Core parking is a power control feature and it can co-work with NPTM to
> control system power budget through online/offline some CPUs in the system.
> These patches implement core parking feature for xen. They consist of 2
> parts: dom0 patches and xen hypervisor patches.
>
> At dom0 side, patches include
> [Patch 1/3] intercept native pad (Processor Aggregator Device) logic,
> providing a native interface for natvie platform and a paravirt template for
> paravirt platform, so that os can implicitly hook to proper ops accordingly;
> [Patch 2/3] redirect paravirt template to Xen pv ops;
> [Patch 3/3] implement Xen pad logic, and when getting pad device
> notification, it hypercalls to Xen hypervisor for core parking. Due to the
> characteristic of xen continue_hypercall_on_cpu, dom0 seperately send/get
> core parking request/result;
>
> At Xen hypervisor side, patches include
> [Patch 1/2] implement hypercall through which dom0 send core parking
> request, and get core parking result;
> [Patch 2/2] implement Xen core parking. Different core parking sequence has
> different power/performance result, due to cpu socket/core/thread topology.
> This patch provide power-first and performance-first policies, users can choose
> core parking policy on their own demand, considering power and performance
> tradeoff.
Does this really need to be implemented in the hypervisor? All this
boils down to is a wrapper around cpu_down() and cpu_up(), which
have hypercall interfaces already. So I'd rather see this as being
an extension to Dom0's pCPU management patches (which aren't
upstream afaict)...
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-17 9:41 ` Jan Beulich
@ 2012-02-17 17:48 ` Liu, Jinsong
2012-02-21 8:03 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-02-17 17:48 UTC (permalink / raw)
To: Jan Beulich
Cc: Li, Shaohua, keir.xen@gmail.com, Konrad Rzeszutek Wilk, xen-devel
Jan Beulich wrote:
>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Core parking is a power control feature and it can co-work with NPTM
>> to control system power budget through online/offline some CPUs in
>> the system. These patches implement core parking feature for xen.
>> They consist of 2
>> parts: dom0 patches and xen hypervisor patches.
>>
>> At dom0 side, patches include
>> [Patch 1/3] intercept native pad (Processor Aggregator Device) logic,
>> providing a native interface for natvie platform and a paravirt
>> template for paravirt platform, so that os can implicitly hook to
>> proper ops accordingly; [Patch 2/3] redirect paravirt template to
>> Xen pv ops; [Patch 3/3] implement Xen pad logic, and when getting
>> pad device
>> notification, it hypercalls to Xen hypervisor for core parking. Due
>> to the characteristic of xen continue_hypercall_on_cpu, dom0
>> seperately send/get
>> core parking request/result;
>>
>> At Xen hypervisor side, patches include
>> [Patch 1/2] implement hypercall through which dom0 send core parking
>> request, and get core parking result;
>> [Patch 2/2] implement Xen core parking. Different core parking
>> sequence has different power/performance result, due to cpu
>> socket/core/thread topology. This patch provide power-first and
>> performance-first policies, users can choose core parking policy on
>> their own demand, considering power and performance tradeoff.
>
> Does this really need to be implemented in the hypervisor? All this
> boils down to is a wrapper around cpu_down() and cpu_up(), which
> have hypercall interfaces already. So I'd rather see this as being
> an extension to Dom0's pCPU management patches (which aren't
> upstream afaict)...
>
> Jan
It's a design choice. Core parking is not only a wrapper around cpu_down/up, it also involves policy algorithms which depend on physical cpu topology and cpu_online/present_map, etc. Implement core parking at dom0 side need expose all those information to dom0, with potential issues (like coherence), while dom0 still need do same work as hypervisor.
Our idea is to keep dom0 as ACPI parser, then hypercall and do rest things at hypervisor side.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-17 17:48 ` Liu, Jinsong
@ 2012-02-21 8:03 ` Jan Beulich
2012-02-22 3:19 ` Liu, Jinsong
[not found] ` <DE8DF0795D48FD4CA783C40EC82923350A7F35@SHSMSX101.ccr.corp.intel.com>
0 siblings, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2012-02-21 8:03 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, keir.xen@gmail.com, Konrad RzeszutekWilk, xen-devel
>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Core parking is a power control feature and it can co-work with NPTM
>>> to control system power budget through online/offline some CPUs in
>>> the system. These patches implement core parking feature for xen.
>>> They consist of 2
>>> parts: dom0 patches and xen hypervisor patches.
>>>
>>> At dom0 side, patches include
>>> [Patch 1/3] intercept native pad (Processor Aggregator Device) logic,
>>> providing a native interface for natvie platform and a paravirt
>>> template for paravirt platform, so that os can implicitly hook to
>>> proper ops accordingly; [Patch 2/3] redirect paravirt template to
>>> Xen pv ops; [Patch 3/3] implement Xen pad logic, and when getting
>>> pad device
>>> notification, it hypercalls to Xen hypervisor for core parking. Due
>>> to the characteristic of xen continue_hypercall_on_cpu, dom0
>>> seperately send/get
>>> core parking request/result;
>>>
>>> At Xen hypervisor side, patches include
>>> [Patch 1/2] implement hypercall through which dom0 send core parking
>>> request, and get core parking result;
>>> [Patch 2/2] implement Xen core parking. Different core parking
>>> sequence has different power/performance result, due to cpu
>>> socket/core/thread topology. This patch provide power-first and
>>> performance-first policies, users can choose core parking policy on
>>> their own demand, considering power and performance tradeoff.
>>
>> Does this really need to be implemented in the hypervisor? All this
>> boils down to is a wrapper around cpu_down() and cpu_up(), which
>> have hypercall interfaces already. So I'd rather see this as being
>> an extension to Dom0's pCPU management patches (which aren't
>> upstream afaict)...
>>
>> Jan
>
> It's a design choice. Core parking is not only a wrapper around cpu_down/up,
> it also involves policy algorithms which depend on physical cpu topology and
> cpu_online/present_map, etc. Implement core parking at dom0 side need expose
> all those information to dom0, with potential issues (like coherence), while
> dom0 still need do same work as hypervisor.
> Our idea is to keep dom0 as ACPI parser, then hypercall and do rest things
> at hypervisor side.
Actually, after some more thought, I don't even think this ought to
be implemented in the Dom0 kernel, but in user space altogether.
Afaict all information necessary is already being exposed.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-21 8:03 ` Jan Beulich
@ 2012-02-22 3:19 ` Liu, Jinsong
[not found] ` <DE8DF0795D48FD4CA783C40EC82923350A7F35@SHSMSX101.ccr.corp.intel.com>
1 sibling, 0 replies; 22+ messages in thread
From: Liu, Jinsong @ 2012-02-22 3:19 UTC (permalink / raw)
To: Jan Beulich
Cc: Li, Shaohua, keir.xen@gmail.com, Konrad RzeszutekWilk, xen-devel
Jan Beulich wrote:
>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Core parking is a power control feature and it can co-work with
>>>> NPTM to control system power budget through online/offline some
>>>> CPUs in the system. These patches implement core parking feature
>>>> for xen. They consist of 2 parts: dom0 patches and xen hypervisor
>>>> patches.
>>>>
>>>> At dom0 side, patches include
>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>> logic, providing a native interface for natvie platform and a
>>>> paravirt template for paravirt platform, so that os can implicitly
>>>> hook to proper ops accordingly; [Patch 2/3] redirect paravirt
>>>> template to Xen pv ops; [Patch 3/3] implement Xen pad logic, and
>>>> when getting pad device notification, it hypercalls to Xen
>>>> hypervisor for core parking. Due to the characteristic of xen
>>>> continue_hypercall_on_cpu, dom0 seperately send/get core parking
>>>> request/result;
>>>>
>>>> At Xen hypervisor side, patches include
>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>> parking request, and get core parking result;
>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>> sequence has different power/performance result, due to cpu
>>>> socket/core/thread topology. This patch provide power-first and
>>>> performance-first policies, users can choose core parking policy on
>>>> their own demand, considering power and performance tradeoff.
>>>
>>> Does this really need to be implemented in the hypervisor? All this
>>> boils down to is a wrapper around cpu_down() and cpu_up(), which
>>> have hypercall interfaces already. So I'd rather see this as being
>>> an extension to Dom0's pCPU management patches (which aren't
>>> upstream afaict)...
>>>
>>> Jan
>>
>> It's a design choice. Core parking is not only a wrapper around
>> cpu_down/up, it also involves policy algorithms which depend on
>> physical cpu topology and cpu_online/present_map, etc. Implement
>> core parking at dom0 side need expose all those information to dom0,
>> with potential issues (like coherence), while dom0 still need do
>> same work as hypervisor.
>> Our idea is to keep dom0 as ACPI parser, then hypercall and do rest
>> things at hypervisor side.
>
> Actually, after some more thought, I don't even think this ought to
> be implemented in the Dom0 kernel, but in user space altogether.
> Afaict all information necessary is already being exposed.
>
No, user space lack necessary information. If I didn't misunderstand, it has some dom0-side dependencies not ready now, like
1. sysfs interface, and exposing xen pcpu topology and maps;
2. intecept pad notify and call usermodehelper;
3. a daemon to monitor/policy core parking (daemon enable when linux run as pvops under xen (kernel acpi_pad disable now), daemon disable when linux run under baremetal (kernel acpi_pad enable now))
Seems keep same approach as native kernel which handle acpi_pad in kernel side (for us, in hypervisor side) is a reasonable choice. Per my understanding core parking is a co-work part of NPTM, the whole process is basically a remote controller-microengine-bios-kernel process, not necessarily involve user action.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
[not found] ` <DE8DF0795D48FD4CA783C40EC82923350A7F35@SHSMSX101.ccr.corp.intel.com>
@ 2012-02-29 12:41 ` Liu, Jinsong
2012-02-29 12:47 ` Liu, Jinsong
2012-02-29 13:47 ` Jan Beulich
0 siblings, 2 replies; 22+ messages in thread
From: Liu, Jinsong @ 2012-02-29 12:41 UTC (permalink / raw)
To: 'Jan Beulich'
Cc: Liu, Jinsong, Li, Shaohua, 'keir.xen@gmail.com',
'Konrad RzeszutekWilk', 'xen-devel'
Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote:
>>>>> Core parking is a power control feature and it can co-work with
>>>>> NPTM to control system power budget through online/offline some
>>>>> CPUs in the system. These patches implement core parking feature
>>>>> for xen. They consist of 2 parts: dom0 patches and xen hypervisor
>>>>> patches.
>>>>>
>>>>> At dom0 side, patches include
>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>> logic, providing a native interface for natvie platform and a
>>>>> paravirt template for paravirt platform, so that os can implicitly
>>>>> hook to proper ops accordingly; [Patch 2/3] redirect paravirt
>>>>> template to Xen pv ops; [Patch 3/3] implement Xen pad logic, and
>>>>> when getting pad device notification, it hypercalls to Xen
>>>>> hypervisor for core parking. Due to the characteristic of xen
>>>>> continue_hypercall_on_cpu, dom0 seperately send/get core parking
>>>>> request/result;
>>>>>
>>>>> At Xen hypervisor side, patches include
>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>> parking request, and get core parking result;
>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>> sequence has different power/performance result, due to cpu
>>>>> socket/core/thread topology. This patch provide power-first and
>>>>> performance-first policies, users can choose core parking policy
>>>>> on their own demand, considering power and performance tradeoff.
>>>>
>>>> Does this really need to be implemented in the hypervisor? All this
>>>> boils down to is a wrapper around cpu_down() and cpu_up(), which
>>>> have hypercall interfaces already. So I'd rather see this as being
>>>> an extension to Dom0's pCPU management patches (which aren't
>>>> upstream afaict)...
>>>>
>>>> Jan
>>>
>>> It's a design choice. Core parking is not only a wrapper around
>>> cpu_down/up, it also involves policy algorithms which depend on
>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>> core parking at dom0 side need expose all those information to dom0,
>>> with potential issues (like coherence), while dom0 still need do
>>> same work as hypervisor. Our idea is to keep dom0 as ACPI parser,
>>> then hypercall and do rest things at hypervisor side.
>>
>> Actually, after some more thought, I don't even think this ought to
>> be implemented in the Dom0 kernel, but in user space altogether.
>> Afaict all information necessary is already being exposed.
>>
>
> No, user space lack necessary information. If I didn't misunderstand,
> it has some dom0-side dependencies not ready now, like
> 1. sysfs interface, and exposing xen pcpu topology and maps;
> 2. intecept pad notify and call usermodehelper;
> 3. a daemon to monitor/policy core parking (daemon enable when linux
> run as pvops under xen (kernel acpi_pad disable now), daemon disable
> when linux run under baremetal (kernel acpi_pad enable now))
>
> Seems keep same approach as native kernel which handle acpi_pad in
> kernel side (for us, in hypervisor side) is a reasonable choice. Per
> my understanding core parking is a co-work part of NPTM, the whole
> process is basically a remote controller-microengine-bios-kernel
> process, not necessarily involve user action.
>
Any comments?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-29 12:41 ` Liu, Jinsong
@ 2012-02-29 12:47 ` Liu, Jinsong
2012-02-29 13:47 ` Jan Beulich
1 sibling, 0 replies; 22+ messages in thread
From: Liu, Jinsong @ 2012-02-29 12:47 UTC (permalink / raw)
To: 'Jan Beulich'
Cc: Li, Shaohua, 'keir.xen@gmail.com',
'Konrad RzeszutekWilk', 'xen-devel'
[-- Attachment #1: Type: text/plain, Size: 3689 bytes --]
Liu, Jinsong wrote:
> Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Core parking is a power control feature and it can co-work with
>>>>>> NPTM to control system power budget through online/offline some
>>>>>> CPUs in the system. These patches implement core parking feature
>>>>>> for xen. They consist of 2 parts: dom0 patches and xen
>>>>>> hypervisor patches.
>>>>>>
>>>>>> At dom0 side, patches include
>>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>>> logic, providing a native interface for natvie platform and a
>>>>>> paravirt template for paravirt platform, so that os can
>>>>>> implicitly hook to proper ops accordingly; [Patch 2/3] redirect
>>>>>> paravirt template to Xen pv ops; [Patch 3/3] implement Xen pad
>>>>>> logic, and when getting pad device notification, it hypercalls
>>>>>> to Xen hypervisor for core parking. Due to the characteristic of
>>>>>> xen continue_hypercall_on_cpu, dom0 seperately send/get core
>>>>>> parking request/result;
>>>>>>
>>>>>> At Xen hypervisor side, patches include
>>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>>> parking request, and get core parking result;
>>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>>> sequence has different power/performance result, due to cpu
>>>>>> socket/core/thread topology. This patch provide power-first and
>>>>>> performance-first policies, users can choose core parking policy
>>>>>> on their own demand, considering power and performance tradeoff.
>>>>>
>>>>> Does this really need to be implemented in the hypervisor? All
>>>>> this boils down to is a wrapper around cpu_down() and cpu_up(),
>>>>> which have hypercall interfaces already. So I'd rather see this
>>>>> as being an extension to Dom0's pCPU management patches (which
>>>>> aren't upstream afaict)...
>>>>>
>>>>> Jan
>>>>
>>>> It's a design choice. Core parking is not only a wrapper around
>>>> cpu_down/up, it also involves policy algorithms which depend on
>>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>>> core parking at dom0 side need expose all those information to
>>>> dom0, with potential issues (like coherence), while dom0 still
>>>> need do same work as hypervisor. Our idea is to keep dom0 as ACPI
>>>> parser, then hypercall and do rest things at hypervisor side.
>>>
>>> Actually, after some more thought, I don't even think this ought to
>>> be implemented in the Dom0 kernel, but in user space altogether.
>>> Afaict all information necessary is already being exposed.
>>>
>>
>> No, user space lack necessary information. If I didn't misunderstand,
>> it has some dom0-side dependencies not ready now, like
>> 1. sysfs interface, and exposing xen pcpu topology and maps;
>> 2. intecept pad notify and call usermodehelper;
>> 3. a daemon to monitor/policy core parking (daemon enable when linux
>> run as pvops under xen (kernel acpi_pad disable now), daemon disable
>> when linux run under baremetal (kernel acpi_pad enable now))
>>
>> Seems keep same approach as native kernel which handle acpi_pad in
>> kernel side (for us, in hypervisor side) is a reasonable choice. Per
>> my understanding core parking is a co-work part of NPTM, the whole
>> process is basically a remote controller-microengine-bios-kernel
>> process, not necessarily involve user action.
>>
>
> Any comments?
>
Sorry, forgot to re-attach patches :-)
[-- Attachment #2: xen_core_parking_1.patch --]
[-- Type: application/octet-stream, Size: 3395 bytes --]
Xen core parking 1: hypercall
This patch implement hypercall through which dom0 send core parking request, and get core parking result.
Due to the characteristic of continue_hypercall_on_cpu, dom0 seperately send/get core parking request/result.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
diff -r 92e03310878f xen/arch/x86/platform_hypercall.c
--- a/xen/arch/x86/platform_hypercall.c Wed Feb 08 21:05:52 2012 +0800
+++ b/xen/arch/x86/platform_hypercall.c Mon Feb 13 11:33:16 2012 +0800
@@ -56,6 +56,10 @@
long cpu_up_helper(void *data);
long cpu_down_helper(void *data);
+/* from core_parking.c */
+long core_parking_helper(void *data);
+uint32_t get_cur_idle_nums(void);
+
ret_t do_platform_op(XEN_GUEST_HANDLE(xen_platform_op_t) u_xenpf_op)
{
ret_t ret = 0;
@@ -593,6 +597,32 @@
op->u.mem_add.epfn,
op->u.mem_add.pxm);
break;
+
+ case XENPF_core_parking:
+ {
+ uint32_t idle_nums;
+
+ switch(op->u.core_parking.type)
+ {
+ case XEN_CORE_PARKING_SET:
+ idle_nums = min_t(uint32_t,
+ op->u.core_parking.idle_nums, num_present_cpus() - 1);
+ ret = continue_hypercall_on_cpu(
+ 0, core_parking_helper, (void *)(unsigned long)idle_nums);
+ break;
+
+ case XEN_CORE_PARKING_GET:
+ op->u.core_parking.idle_nums = get_cur_idle_nums();
+ ret = copy_to_guest(u_xenpf_op, op, 1) ? -EFAULT : 0;
+ break;
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ }
+ break;
+
default:
ret = -ENOSYS;
break;
diff -r 92e03310878f xen/common/Makefile
--- a/xen/common/Makefile Wed Feb 08 21:05:52 2012 +0800
+++ b/xen/common/Makefile Mon Feb 13 11:33:16 2012 +0800
@@ -1,6 +1,7 @@
obj-y += bitmap.o
obj-y += cpu.o
obj-y += cpupool.o
+obj-y += core_parking.o
obj-y += domctl.o
obj-y += domain.o
obj-y += event_channel.o
diff -r 92e03310878f xen/common/core_parking.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/common/core_parking.c Mon Feb 13 11:33:16 2012 +0800
@@ -0,0 +1,13 @@
+#include <xen/types.h>
+
+static uint32_t cur_idle_nums;
+
+long core_parking_helper(void *data)
+{
+ return 0;
+}
+
+uint32_t get_cur_idle_nums(void)
+{
+ return cur_idle_nums;
+}
diff -r 92e03310878f xen/include/public/platform.h
--- a/xen/include/public/platform.h Wed Feb 08 21:05:52 2012 +0800
+++ b/xen/include/public/platform.h Mon Feb 13 11:33:16 2012 +0800
@@ -490,6 +490,20 @@
uint32_t flags;
};
+#define XENPF_core_parking 60
+
+#define XEN_CORE_PARKING_SET 1
+#define XEN_CORE_PARKING_GET 2
+struct xenpf_core_parking {
+ /* IN variables */
+ uint32_t type;
+ /* IN variables: set cpu nums expected to be idled */
+ /* OUT variables: get cpu nums actually be idled */
+ uint32_t idle_nums;
+};
+typedef struct xenpf_core_parking xenpf_core_parking_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
+
struct xen_platform_op {
uint32_t cmd;
uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
@@ -511,6 +525,7 @@
struct xenpf_cpu_ol cpu_ol;
struct xenpf_cpu_hotadd cpu_add;
struct xenpf_mem_hotadd mem_add;
+ struct xenpf_core_parking core_parking;
uint8_t pad[128];
} u;
};
[-- Attachment #3: xen_core_parking_2.patch --]
[-- Type: application/octet-stream, Size: 7292 bytes --]
Xen core parking 2: core parking implementation
This patch implement Xen core parking.
Different core parking sequence has different power/performance result, due to cpu socket/core/thread topology.
This patch provide power-first and performance-first policies, users can choose core parking policy by their own demand.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
diff -r 392e77171d74 xen/common/core_parking.c
--- a/xen/common/core_parking.c Mon Feb 13 11:33:17 2012 +0800
+++ b/xen/common/core_parking.c Thu Feb 16 23:39:57 2012 +0800
@@ -1,13 +1,240 @@
+/*
+ * core_parking.c - implement core parking according to dom0 requirement
+ *
+ * Copyright (C) 2012, Intel Corporation.
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
#include <xen/types.h>
+#include <xen/cpu.h>
+#include <xen/init.h>
+#include <xen/cpumask.h>
+#include <asm/percpu.h>
+#include <asm/smp.h>
+
+#define CORE_PARKING_INCREMENT 1
+#define CORE_PARKING_DECREMENT 2
+
+static unsigned int core_parking_power(unsigned int event);
+static unsigned int core_parking_performance(unsigned int event);
static uint32_t cur_idle_nums;
+static unsigned int core_parking_cpunum[NR_CPUS] = {[0 ... NR_CPUS-1] = -1};
+
+static struct core_parking_policy {
+ char name[30];
+ unsigned int (*next)(unsigned int event);
+} *core_parking_policy;
+
+static enum core_parking_controller {
+ POWER_FIRST,
+ PERFORMANCE_FIRST
+} core_parking_controller = POWER_FIRST;
+
+static void __init setup_core_parking_option(char *str)
+{
+ if ( !strcmp(str, "power") )
+ core_parking_controller = POWER_FIRST;
+ else if ( !strcmp(str, "performance") )
+ core_parking_controller = PERFORMANCE_FIRST;
+ else
+ return;
+}
+custom_param("core_parking", setup_core_parking_option);
+
+static unsigned int core_parking_performance(unsigned int event)
+{
+ unsigned int cpu = -1;
+
+ switch ( event )
+ {
+ case CORE_PARKING_INCREMENT:
+ {
+ int core_tmp, core_weight = -1;
+ int sibling_tmp, sibling_weight = -1;
+ cpumask_t core_candidate_map, sibling_candidate_map;
+ cpumask_clear(&core_candidate_map);
+ cpumask_clear(&sibling_candidate_map);
+
+ for_each_cpu(cpu, &cpu_online_map)
+ {
+ if ( cpu == 0 )
+ continue;
+
+ core_tmp = cpumask_weight(per_cpu(cpu_core_mask, cpu));
+ if ( core_weight < core_tmp )
+ {
+ core_weight = core_tmp;
+ cpumask_clear(&core_candidate_map);
+ cpumask_set_cpu(cpu, &core_candidate_map);
+ }
+ else if ( core_weight == core_tmp )
+ cpumask_set_cpu(cpu, &core_candidate_map);
+ }
+
+ for_each_cpu(cpu, &core_candidate_map)
+ {
+ sibling_tmp = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
+ if ( sibling_weight < sibling_tmp )
+ {
+ sibling_weight = sibling_tmp;
+ cpumask_clear(&sibling_candidate_map);
+ cpumask_set_cpu(cpu, &sibling_candidate_map);
+ }
+ else if ( sibling_weight == sibling_tmp )
+ cpumask_set_cpu(cpu, &sibling_candidate_map);
+ }
+
+ cpu = cpumask_first(&sibling_candidate_map);
+ }
+ break;
+
+ case CORE_PARKING_DECREMENT:
+ {
+ cpu = core_parking_cpunum[cur_idle_nums -1];
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return cpu;
+}
+
+static unsigned int core_parking_power(unsigned int event)
+{
+ unsigned int cpu = -1;
+
+ switch ( event )
+ {
+ case CORE_PARKING_INCREMENT:
+ {
+ int core_tmp, core_weight = NR_CPUS + 1;
+ int sibling_tmp, sibling_weight = NR_CPUS + 1;
+ cpumask_t core_candidate_map, sibling_candidate_map;
+ cpumask_clear(&core_candidate_map);
+ cpumask_clear(&sibling_candidate_map);
+
+ for_each_cpu(cpu, &cpu_online_map)
+ {
+ if ( cpu == 0 )
+ continue;
+
+ core_tmp = cpumask_weight(per_cpu(cpu_core_mask, cpu));
+ if ( core_weight > core_tmp )
+ {
+ core_weight = core_tmp;
+ cpumask_clear(&core_candidate_map);
+ cpumask_set_cpu(cpu, &core_candidate_map);
+ }
+ else if ( core_weight == core_tmp )
+ cpumask_set_cpu(cpu, &core_candidate_map);
+ }
+
+ for_each_cpu(cpu, &core_candidate_map)
+ {
+ sibling_tmp = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
+ if ( sibling_weight > sibling_tmp )
+ {
+ sibling_weight = sibling_tmp;
+ cpumask_clear(&sibling_candidate_map);
+ cpumask_set_cpu(cpu, &sibling_candidate_map);
+ }
+ else if ( sibling_weight == sibling_tmp )
+ cpumask_set_cpu(cpu, &sibling_candidate_map);
+ }
+
+ cpu = cpumask_first(&sibling_candidate_map);
+ }
+ break;
+
+ case CORE_PARKING_DECREMENT:
+ {
+ cpu = core_parking_cpunum[cur_idle_nums -1];
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return cpu;
+}
long core_parking_helper(void *data)
{
- return 0;
+ uint32_t idle_nums = (unsigned long)data;
+ unsigned int cpu;
+ int ret = 0;
+
+ if ( !core_parking_policy )
+ return -EINVAL;
+
+ while ( cur_idle_nums < idle_nums )
+ {
+ cpu = core_parking_policy->next(CORE_PARKING_INCREMENT);
+ ret = cpu_down(cpu);
+ if ( ret )
+ return ret;
+ core_parking_cpunum[cur_idle_nums++] = cpu;
+ }
+
+ while ( cur_idle_nums > idle_nums )
+ {
+ cpu = core_parking_policy->next(CORE_PARKING_DECREMENT);
+ ret = cpu_up(cpu);
+ if ( ret )
+ return ret;
+ core_parking_cpunum[--cur_idle_nums] = -1;
+ }
+
+ return ret;
}
uint32_t get_cur_idle_nums(void)
{
return cur_idle_nums;
}
+
+static struct core_parking_policy power_first = {
+ .name = "power",
+ .next = core_parking_power,
+};
+
+static struct core_parking_policy performance_first = {
+ .name = "performance",
+ .next = core_parking_performance,
+};
+
+static int register_core_parking_policy(struct core_parking_policy *policy)
+{
+ if ( !policy || !policy->next )
+ return -EINVAL;
+
+ core_parking_policy = policy;
+ return 0;
+}
+
+static int __init core_parking_init(void)
+{
+ int ret = 0;
+
+ if ( core_parking_controller == PERFORMANCE_FIRST )
+ ret = register_core_parking_policy(&performance_first);
+ else
+ ret = register_core_parking_policy(&power_first);
+
+ return ret;
+}
+__initcall(core_parking_init);
[-- Attachment #4: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-29 12:41 ` Liu, Jinsong
2012-02-29 12:47 ` Liu, Jinsong
@ 2012-02-29 13:47 ` Jan Beulich
2012-03-01 8:20 ` Liu, Jinsong
1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-02-29 13:47 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
>>> On 29.02.12 at 13:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Core parking is a power control feature and it can co-work with
>>>>>> NPTM to control system power budget through online/offline some
>>>>>> CPUs in the system. These patches implement core parking feature
>>>>>> for xen. They consist of 2 parts: dom0 patches and xen hypervisor
>>>>>> patches.
>>>>>>
>>>>>> At dom0 side, patches include
>>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>>> logic, providing a native interface for natvie platform and a
>>>>>> paravirt template for paravirt platform, so that os can implicitly
>>>>>> hook to proper ops accordingly; [Patch 2/3] redirect paravirt
>>>>>> template to Xen pv ops; [Patch 3/3] implement Xen pad logic, and
>>>>>> when getting pad device notification, it hypercalls to Xen
>>>>>> hypervisor for core parking. Due to the characteristic of xen
>>>>>> continue_hypercall_on_cpu, dom0 seperately send/get core parking
>>>>>> request/result;
>>>>>>
>>>>>> At Xen hypervisor side, patches include
>>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>>> parking request, and get core parking result;
>>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>>> sequence has different power/performance result, due to cpu
>>>>>> socket/core/thread topology. This patch provide power-first and
>>>>>> performance-first policies, users can choose core parking policy
>>>>>> on their own demand, considering power and performance tradeoff.
>>>>>
>>>>> Does this really need to be implemented in the hypervisor? All this
>>>>> boils down to is a wrapper around cpu_down() and cpu_up(), which
>>>>> have hypercall interfaces already. So I'd rather see this as being
>>>>> an extension to Dom0's pCPU management patches (which aren't
>>>>> upstream afaict)...
>>>>>
>>>>> Jan
>>>>
>>>> It's a design choice. Core parking is not only a wrapper around
>>>> cpu_down/up, it also involves policy algorithms which depend on
>>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>>> core parking at dom0 side need expose all those information to dom0,
>>>> with potential issues (like coherence), while dom0 still need do
>>>> same work as hypervisor. Our idea is to keep dom0 as ACPI parser,
>>>> then hypercall and do rest things at hypervisor side.
>>>
>>> Actually, after some more thought, I don't even think this ought to
>>> be implemented in the Dom0 kernel, but in user space altogether.
>>> Afaict all information necessary is already being exposed.
>>>
>>
>> No, user space lack necessary information. If I didn't misunderstand,
>> it has some dom0-side dependencies not ready now, like
>> 1. sysfs interface, and exposing xen pcpu topology and maps;
>> 2. intecept pad notify and call usermodehelper;
>> 3. a daemon to monitor/policy core parking (daemon enable when linux
>> run as pvops under xen (kernel acpi_pad disable now), daemon disable
>> when linux run under baremetal (kernel acpi_pad enable now))
>>
>> Seems keep same approach as native kernel which handle acpi_pad in
>> kernel side (for us, in hypervisor side) is a reasonable choice. Per
>> my understanding core parking is a co-work part of NPTM, the whole
>> process is basically a remote controller-microengine-bios-kernel
>> process, not necessarily involve user action.
>>
>
> Any comments?
No - I continue to disagree that this needs to be done outside of
user space (the fact that certain necessary kernel pieces aren't in
pv-ops is no excuse, nor is it that native does this in the kernel -
that would at most allow for implementing it in the kernel, but still
won't justify doing it in the hypervisor).
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-02-29 13:47 ` Jan Beulich
@ 2012-03-01 8:20 ` Liu, Jinsong
2012-03-01 8:50 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-03-01 8:20 UTC (permalink / raw)
To: Jan Beulich
Cc: Li, Shaohua, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
Jan Beulich wrote:
>>>> On 29.02.12 at 13:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Liu, Jinsong wrote:
>>> Jan Beulich wrote:
>>>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> Core parking is a power control feature and it can co-work with
>>>>>>> NPTM to control system power budget through online/offline some
>>>>>>> CPUs in the system. These patches implement core parking feature
>>>>>>> for xen. They consist of 2 parts: dom0 patches and xen
>>>>>>> hypervisor patches.
>>>>>>>
>>>>>>> At dom0 side, patches include
>>>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>>>> logic, providing a native interface for natvie platform and a
>>>>>>> paravirt template for paravirt platform, so that os can
>>>>>>> implicitly hook to proper ops accordingly; [Patch 2/3] redirect
>>>>>>> paravirt template to Xen pv ops; [Patch 3/3] implement Xen pad
>>>>>>> logic, and when getting pad device notification, it hypercalls
>>>>>>> to Xen hypervisor for core parking. Due to the characteristic
>>>>>>> of xen continue_hypercall_on_cpu, dom0 seperately send/get core
>>>>>>> parking request/result;
>>>>>>>
>>>>>>> At Xen hypervisor side, patches include
>>>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>>>> parking request, and get core parking result;
>>>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>>>> sequence has different power/performance result, due to cpu
>>>>>>> socket/core/thread topology. This patch provide power-first and
>>>>>>> performance-first policies, users can choose core parking policy
>>>>>>> on their own demand, considering power and performance tradeoff.
>>>>>>
>>>>>> Does this really need to be implemented in the hypervisor? All
>>>>>> this boils down to is a wrapper around cpu_down() and cpu_up(),
>>>>>> which have hypercall interfaces already. So I'd rather see this
>>>>>> as being an extension to Dom0's pCPU management patches (which
>>>>>> aren't upstream afaict)...
>>>>>>
>>>>>> Jan
>>>>>
>>>>> It's a design choice. Core parking is not only a wrapper around
>>>>> cpu_down/up, it also involves policy algorithms which depend on
>>>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>>>> core parking at dom0 side need expose all those information to
>>>>> dom0, with potential issues (like coherence), while dom0 still
>>>>> need do same work as hypervisor. Our idea is to keep dom0 as ACPI
>>>>> parser, then hypercall and do rest things at hypervisor side.
>>>>
>>>> Actually, after some more thought, I don't even think this ought to
>>>> be implemented in the Dom0 kernel, but in user space altogether.
>>>> Afaict all information necessary is already being exposed.
>>>>
>>>
>>> No, user space lack necessary information. If I didn't
>>> misunderstand, it has some dom0-side dependencies not ready now,
>>> like
>>> 1. sysfs interface, and exposing xen pcpu topology and maps;
>>> 2. intecept pad notify and call usermodehelper;
>>> 3. a daemon to monitor/policy core parking (daemon enable when linux
>>> run as pvops under xen (kernel acpi_pad disable now), daemon disable
>>> when linux run under baremetal (kernel acpi_pad enable now))
>>>
>>> Seems keep same approach as native kernel which handle acpi_pad in
>>> kernel side (for us, in hypervisor side) is a reasonable choice. Per
>>> my understanding core parking is a co-work part of NPTM, the whole
>>> process is basically a remote controller-microengine-bios-kernel
>>> process, not necessarily involve user action.
>>>
>>
>> Any comments?
>
> No - I continue to disagree that this needs to be done outside of
> user space (the fact that certain necessary kernel pieces aren't in
> pv-ops is no excuse, nor is it that native does this in the kernel -
> that would at most allow for implementing it in the kernel, but still
> won't justify doing it in the hypervisor).
>
Jan, could you elaborate more your thoughts? like
- the pros of user space approach (and cons, if it has);
- the disadvantages of hypervisor approach;
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 8:20 ` Liu, Jinsong
@ 2012-03-01 8:50 ` Jan Beulich
2012-03-01 11:14 ` Liu, Jinsong
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-03-01 8:50 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
>>> On 01.03.12 at 09:20, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 29.02.12 at 13:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Liu, Jinsong wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>> wrote:
>>>>>>>> Core parking is a power control feature and it can co-work with
>>>>>>>> NPTM to control system power budget through online/offline some
>>>>>>>> CPUs in the system. These patches implement core parking feature
>>>>>>>> for xen. They consist of 2 parts: dom0 patches and xen
>>>>>>>> hypervisor patches.
>>>>>>>>
>>>>>>>> At dom0 side, patches include
>>>>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>>>>> logic, providing a native interface for natvie platform and a
>>>>>>>> paravirt template for paravirt platform, so that os can
>>>>>>>> implicitly hook to proper ops accordingly; [Patch 2/3] redirect
>>>>>>>> paravirt template to Xen pv ops; [Patch 3/3] implement Xen pad
>>>>>>>> logic, and when getting pad device notification, it hypercalls
>>>>>>>> to Xen hypervisor for core parking. Due to the characteristic
>>>>>>>> of xen continue_hypercall_on_cpu, dom0 seperately send/get core
>>>>>>>> parking request/result;
>>>>>>>>
>>>>>>>> At Xen hypervisor side, patches include
>>>>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>>>>> parking request, and get core parking result;
>>>>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>>>>> sequence has different power/performance result, due to cpu
>>>>>>>> socket/core/thread topology. This patch provide power-first and
>>>>>>>> performance-first policies, users can choose core parking policy
>>>>>>>> on their own demand, considering power and performance tradeoff.
>>>>>>>
>>>>>>> Does this really need to be implemented in the hypervisor? All
>>>>>>> this boils down to is a wrapper around cpu_down() and cpu_up(),
>>>>>>> which have hypercall interfaces already. So I'd rather see this
>>>>>>> as being an extension to Dom0's pCPU management patches (which
>>>>>>> aren't upstream afaict)...
>>>>>>>
>>>>>>> Jan
>>>>>>
>>>>>> It's a design choice. Core parking is not only a wrapper around
>>>>>> cpu_down/up, it also involves policy algorithms which depend on
>>>>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>>>>> core parking at dom0 side need expose all those information to
>>>>>> dom0, with potential issues (like coherence), while dom0 still
>>>>>> need do same work as hypervisor. Our idea is to keep dom0 as ACPI
>>>>>> parser, then hypercall and do rest things at hypervisor side.
>>>>>
>>>>> Actually, after some more thought, I don't even think this ought to
>>>>> be implemented in the Dom0 kernel, but in user space altogether.
>>>>> Afaict all information necessary is already being exposed.
>>>>>
>>>>
>>>> No, user space lack necessary information. If I didn't
>>>> misunderstand, it has some dom0-side dependencies not ready now,
>>>> like
>>>> 1. sysfs interface, and exposing xen pcpu topology and maps;
>>>> 2. intecept pad notify and call usermodehelper;
>>>> 3. a daemon to monitor/policy core parking (daemon enable when linux
>>>> run as pvops under xen (kernel acpi_pad disable now), daemon disable
>>>> when linux run under baremetal (kernel acpi_pad enable now))
>>>>
>>>> Seems keep same approach as native kernel which handle acpi_pad in
>>>> kernel side (for us, in hypervisor side) is a reasonable choice. Per
>>>> my understanding core parking is a co-work part of NPTM, the whole
>>>> process is basically a remote controller-microengine-bios-kernel
>>>> process, not necessarily involve user action.
>>>>
>>>
>>> Any comments?
>>
>> No - I continue to disagree that this needs to be done outside of
>> user space (the fact that certain necessary kernel pieces aren't in
>> pv-ops is no excuse, nor is it that native does this in the kernel -
>> that would at most allow for implementing it in the kernel, but still
>> won't justify doing it in the hypervisor).
>>
>
> Jan, could you elaborate more your thoughts? like
> - the pros of user space approach (and cons, if it has);
> - the disadvantages of hypervisor approach;
Whenever a user space implementation is possible (and not too
cumbersome), I think it ought to be preferred. Even more so when
it involves policy decisions.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 8:50 ` Jan Beulich
@ 2012-03-01 11:14 ` Liu, Jinsong
2012-03-01 11:21 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-03-01 11:14 UTC (permalink / raw)
To: Jan Beulich
Cc: Li, Shaohua, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
Jan Beulich wrote:
>>>> On 01.03.12 at 09:20, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 29.02.12 at 13:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Liu, Jinsong wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 17.02.12 at 18:48, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> Jan Beulich wrote:
>>>>>>>>>>> On 17.02.12 at 09:54, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>> wrote:
>>>>>>>>> Core parking is a power control feature and it can co-work
>>>>>>>>> with NPTM to control system power budget through
>>>>>>>>> online/offline some CPUs in the system. These patches
>>>>>>>>> implement core parking feature for xen. They consist of 2
>>>>>>>>> parts: dom0 patches and xen hypervisor patches.
>>>>>>>>>
>>>>>>>>> At dom0 side, patches include
>>>>>>>>> [Patch 1/3] intercept native pad (Processor Aggregator Device)
>>>>>>>>> logic, providing a native interface for natvie platform and a
>>>>>>>>> paravirt template for paravirt platform, so that os can
>>>>>>>>> implicitly hook to proper ops accordingly; [Patch 2/3]
>>>>>>>>> redirect paravirt template to Xen pv ops; [Patch 3/3]
>>>>>>>>> implement Xen pad logic, and when getting pad device
>>>>>>>>> notification, it hypercalls to Xen hypervisor for core
>>>>>>>>> parking. Due to the characteristic of xen
>>>>>>>>> continue_hypercall_on_cpu, dom0 seperately send/get core
>>>>>>>>> parking request/result;
>>>>>>>>>
>>>>>>>>> At Xen hypervisor side, patches include
>>>>>>>>> [Patch 1/2] implement hypercall through which dom0 send core
>>>>>>>>> parking request, and get core parking result;
>>>>>>>>> [Patch 2/2] implement Xen core parking. Different core parking
>>>>>>>>> sequence has different power/performance result, due to cpu
>>>>>>>>> socket/core/thread topology. This patch provide power-first
>>>>>>>>> and performance-first policies, users can choose core parking
>>>>>>>>> policy on their own demand, considering power and performance
>>>>>>>>> tradeoff.
>>>>>>>>
>>>>>>>> Does this really need to be implemented in the hypervisor? All
>>>>>>>> this boils down to is a wrapper around cpu_down() and cpu_up(),
>>>>>>>> which have hypercall interfaces already. So I'd rather see this
>>>>>>>> as being an extension to Dom0's pCPU management patches (which
>>>>>>>> aren't upstream afaict)...
>>>>>>>>
>>>>>>>> Jan
>>>>>>>
>>>>>>> It's a design choice. Core parking is not only a wrapper around
>>>>>>> cpu_down/up, it also involves policy algorithms which depend on
>>>>>>> physical cpu topology and cpu_online/present_map, etc. Implement
>>>>>>> core parking at dom0 side need expose all those information to
>>>>>>> dom0, with potential issues (like coherence), while dom0 still
>>>>>>> need do same work as hypervisor. Our idea is to keep dom0 as
>>>>>>> ACPI parser, then hypercall and do rest things at hypervisor
>>>>>>> side.
>>>>>>
>>>>>> Actually, after some more thought, I don't even think this ought
>>>>>> to be implemented in the Dom0 kernel, but in user space
>>>>>> altogether. Afaict all information necessary is already being
>>>>>> exposed.
>>>>>>
>>>>>
>>>>> No, user space lack necessary information. If I didn't
>>>>> misunderstand, it has some dom0-side dependencies not ready now,
>>>>> like
>>>>> 1. sysfs interface, and exposing xen pcpu topology and maps;
>>>>> 2. intecept pad notify and call usermodehelper;
>>>>> 3. a daemon to monitor/policy core parking (daemon enable when
>>>>> linux run as pvops under xen (kernel acpi_pad disable now),
>>>>> daemon disable when linux run under baremetal (kernel acpi_pad
>>>>> enable now))
>>>>>
>>>>> Seems keep same approach as native kernel which handle acpi_pad in
>>>>> kernel side (for us, in hypervisor side) is a reasonable choice.
>>>>> Per my understanding core parking is a co-work part of NPTM, the
>>>>> whole process is basically a remote
>>>>> controller-microengine-bios-kernel process, not necessarily
>>>>> involve user action.
>>>>>
>>>>
>>>> Any comments?
>>>
>>> No - I continue to disagree that this needs to be done outside of
>>> user space (the fact that certain necessary kernel pieces aren't in
>>> pv-ops is no excuse, nor is it that native does this in the kernel -
>>> that would at most allow for implementing it in the kernel, but
>>> still won't justify doing it in the hypervisor).
>>>
>>
>> Jan, could you elaborate more your thoughts? like
>> - the pros of user space approach (and cons, if it has);
>> - the disadvantages of hypervisor approach;
>
> Whenever a user space implementation is possible (and not too
> cumbersome), I think it ought to be preferred. Even more so when it involves policy decisions.
>
Unfortunately, yes, though cumbersome is not basic reason user space approach is not preferred.
Core parking is a power management staff, based on dynamic physical details like cpu topologies and maps owned by hypervisor. It's natural to implement it at hypervisor side, like what other xen power management staffs do. Based on same reason does native linux choose to implement it at kernel, not at user space.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 11:14 ` Liu, Jinsong
@ 2012-03-01 11:21 ` Jan Beulich
2012-03-01 14:31 ` Liu, Jinsong
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-03-01 11:21 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Unfortunately, yes, though cumbersome is not basic reason user space
> approach is not preferred.
> Core parking is a power management staff, based on dynamic physical details
> like cpu topologies and maps owned by hypervisor. It's natural to implement
CPU topology is available to user space, and as far as I recall your
hypervisor patch didn't really manipulate any maps - all it did was pick
what CPU to bring up/down, and then carry out that decision.
> it at hypervisor side, like what other xen power management staffs do. Based
Other power management stuff can't be easily done in user space.
> on same reason does native linux choose to implement it at kernel, not at
> user space.
Sorry, no, I don't consider Linux having taken (or going to take - after
all I can't find the word "park" when grepping 3.3-rc5's drivers/acpi/, so
I assume the patch isn't in yet) a questionable decision a reason for
Xen to follow blindly.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 11:21 ` Jan Beulich
@ 2012-03-01 14:31 ` Liu, Jinsong
2012-03-01 15:11 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-03-01 14:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Li, Shaohua, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
Jan Beulich wrote:
>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Unfortunately, yes, though cumbersome is not basic reason user space
>> approach is not preferred. Core parking is a power management staff,
>> based on dynamic physical details like cpu topologies and maps owned
>> by hypervisor. It's natural to implement
>
> CPU topology is available to user space, and as far as I recall your
> hypervisor patch didn't really manipulate any maps - all it did was
> pick what CPU to bring up/down, and then carry out that decision.
No. threads_per_core and cores_per_socket exposed to userspace is pointless to us (and, it's questionable need fixup).
Core parking depends on following physical info (no matter where it implement):
1. cpu_online_map;
2. cpu_present_map;
3. cpu_core_mask;
4. cpu_sibling_mask;
all of them are *dynamic*, especially, 3/4 are varied per cpu and per online/offline ops.
For userspace, consider
- userspace need keep syncing with hypervisor, after each online/offline ops and for each cpu;
- extending new policy need expose whatever physical info it need;
technically it maybe possible to do so, but I doubt it's really necessary.
>
>> it at hypervisor side, like what other xen power management staffs
>> do. Based
>
> Other power management stuff can't be easily done in user space.
I think it's more appropriate not to separate 1 logic into 2 hierarchies.
>
>> on same reason does native linux choose to implement it at kernel,
>> not at user space.
>
> Sorry, no, I don't consider Linux having taken (or going to take -
> after all I can't find the word "park" when grepping 3.3-rc5's
> drivers/acpi/, so I assume the patch isn't in yet) a questionable
> decision a reason for
> Xen to follow blindly.
>
Kernel named it as acpi pad (Processor Aggregator Device).
Xen hypervisor didn't name it as 'pad' because it's too acpi specific, core parking is a more general name.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 14:31 ` Liu, Jinsong
@ 2012-03-01 15:11 ` Jan Beulich
2012-03-02 9:42 ` Haitao Shan
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-03-01 15:11 UTC (permalink / raw)
To: Jinsong Liu
Cc: Shaohua Li, 'keir.xen@gmail.com',
'KonradRzeszutekWilk', 'xen-devel'
>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>> approach is not preferred. Core parking is a power management staff,
>>> based on dynamic physical details like cpu topologies and maps owned
>>> by hypervisor. It's natural to implement
>>
>> CPU topology is available to user space, and as far as I recall your
>> hypervisor patch didn't really manipulate any maps - all it did was
>> pick what CPU to bring up/down, and then carry out that decision.
>
> No. threads_per_core and cores_per_socket exposed to userspace is pointless
> to us (and, it's questionable need fixup).
Sure this would be insufficient. But what do you think did
XEN_SYSCTL_topologyinfo get added for?
> Core parking depends on following physical info (no matter where it
> implement):
> 1. cpu_online_map;
> 2. cpu_present_map;
> 3. cpu_core_mask;
> 4. cpu_sibling_mask;
> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
> online/offline ops.
Afaict all of these can be reconstructed using (mostly sysctl)
hypercalls.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-01 15:11 ` Jan Beulich
@ 2012-03-02 9:42 ` Haitao Shan
2012-03-02 11:00 ` Keir Fraser
2012-03-02 11:46 ` Jan Beulich
0 siblings, 2 replies; 22+ messages in thread
From: Haitao Shan @ 2012-03-02 9:42 UTC (permalink / raw)
To: Jan Beulich
Cc: Jinsong Liu, Shaohua Li, keir.xen@gmail.com, xen-devel,
KonradRzeszutekWilk
I would really doubt the need to create a new interface of receiving
ACPI event and sending to user land (other than existing native
kernel) specifically for Xen. What's the benefit and why kernel people
should buy-in that?
Core parking is a platform feature, not virtualization feature.
Naturally following native approach is the most efficient. Why do you
want to create yet another interface for Xen to do that?
Shan Haitao
2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>>> approach is not preferred. Core parking is a power management staff,
>>>> based on dynamic physical details like cpu topologies and maps owned
>>>> by hypervisor. It's natural to implement
>>>
>>> CPU topology is available to user space, and as far as I recall your
>>> hypervisor patch didn't really manipulate any maps - all it did was
>>> pick what CPU to bring up/down, and then carry out that decision.
>>
>> No. threads_per_core and cores_per_socket exposed to userspace is pointless
>> to us (and, it's questionable need fixup).
>
> Sure this would be insufficient. But what do you think did
> XEN_SYSCTL_topologyinfo get added for?
>
>> Core parking depends on following physical info (no matter where it
>> implement):
>> 1. cpu_online_map;
>> 2. cpu_present_map;
>> 3. cpu_core_mask;
>> 4. cpu_sibling_mask;
>> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
>> online/offline ops.
>
> Afaict all of these can be reconstructed using (mostly sysctl)
> hypercalls.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-02 9:42 ` Haitao Shan
@ 2012-03-02 11:00 ` Keir Fraser
2012-03-05 21:27 ` Haitao Shan
2012-03-02 11:46 ` Jan Beulich
1 sibling, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2012-03-02 11:00 UTC (permalink / raw)
To: Haitao Shan, Jan Beulich
Cc: Liu, Jinsong, Shaohua Li, keir.xen@gmail.com, xen-devel,
Konrad Rzeszutek Wilk
On 02/03/2012 09:42, "Haitao Shan" <maillists.shan@gmail.com> wrote:
> I would really doubt the need to create a new interface of receiving
> ACPI event and sending to user land (other than existing native
> kernel) specifically for Xen. What's the benefit and why kernel people
> should buy-in that?
> Core parking is a platform feature, not virtualization feature.
> Naturally following native approach is the most efficient. Why do you
> want to create yet another interface for Xen to do that?
While I sympathise with your position rather more than Jan does, the fact is
that it's *you* who are suggesting yet-another-Xen-interface. Whereas doing
it in userspace requires only existing hypercalls I believe.
-- Keir
> Shan Haitao
>
> 2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>>>> approach is not preferred. Core parking is a power management staff,
>>>>> based on dynamic physical details like cpu topologies and maps owned
>>>>> by hypervisor. It's natural to implement
>>>>
>>>> CPU topology is available to user space, and as far as I recall your
>>>> hypervisor patch didn't really manipulate any maps - all it did was
>>>> pick what CPU to bring up/down, and then carry out that decision.
>>>
>>> No. threads_per_core and cores_per_socket exposed to userspace is pointless
>>> to us (and, it's questionable need fixup).
>>
>> Sure this would be insufficient. But what do you think did
>> XEN_SYSCTL_topologyinfo get added for?
>>
>>> Core parking depends on following physical info (no matter where it
>>> implement):
>>> 1. cpu_online_map;
>>> 2. cpu_present_map;
>>> 3. cpu_core_mask;
>>> 4. cpu_sibling_mask;
>>> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
>>> online/offline ops.
>>
>> Afaict all of these can be reconstructed using (mostly sysctl)
>> hypercalls.
>>
>> Jan
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-02 9:42 ` Haitao Shan
2012-03-02 11:00 ` Keir Fraser
@ 2012-03-02 11:46 ` Jan Beulich
2012-03-04 15:48 ` Liu, Jinsong
1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-03-02 11:46 UTC (permalink / raw)
To: Haitao Shan
Cc: Jinsong Liu, Shaohua Li, keir.xen@gmail.com, KonradRzeszutekWilk,
xen-devel
>>> On 02.03.12 at 10:42, Haitao Shan <maillists.shan@gmail.com> wrote:
> I would really doubt the need to create a new interface of receiving
> ACPI event and sending to user land (other than existing native
> kernel) specifically for Xen. What's the benefit and why kernel people
> should buy-in that?
Receiving ACPI events??? I don't recall having seen anything like that
in the proposed patch (since if that was the case, I would probably
agree that this is better done in kernel/hypervisor). Was what got
sent out perhaps just a small fraction of what is intended, and was it
forgotten to mention this?
Jan
> Core parking is a platform feature, not virtualization feature.
> Naturally following native approach is the most efficient. Why do you
> want to create yet another interface for Xen to do that?
>
> Shan Haitao
>
> 2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>>>> approach is not preferred. Core parking is a power management staff,
>>>>> based on dynamic physical details like cpu topologies and maps owned
>>>>> by hypervisor. It's natural to implement
>>>>
>>>> CPU topology is available to user space, and as far as I recall your
>>>> hypervisor patch didn't really manipulate any maps - all it did was
>>>> pick what CPU to bring up/down, and then carry out that decision.
>>>
>>> No. threads_per_core and cores_per_socket exposed to userspace is pointless
>>> to us (and, it's questionable need fixup).
>>
>> Sure this would be insufficient. But what do you think did
>> XEN_SYSCTL_topologyinfo get added for?
>>
>>> Core parking depends on following physical info (no matter where it
>>> implement):
>>> 1. cpu_online_map;
>>> 2. cpu_present_map;
>>> 3. cpu_core_mask;
>>> 4. cpu_sibling_mask;
>>> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
>>> online/offline ops.
>>
>> Afaict all of these can be reconstructed using (mostly sysctl)
>> hypercalls.
>>
>> Jan
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-02 11:46 ` Jan Beulich
@ 2012-03-04 15:48 ` Liu, Jinsong
2012-03-05 10:57 ` Jan Beulich
0 siblings, 1 reply; 22+ messages in thread
From: Liu, Jinsong @ 2012-03-04 15:48 UTC (permalink / raw)
To: Jan Beulich, Haitao Shan
Cc: Li, Shaohua, keir.xen@gmail.com, KonradRzeszutekWilk, xen-devel
[-- Attachment #1: Type: text/plain, Size: 2919 bytes --]
Jan Beulich wrote:
>>>> On 02.03.12 at 10:42, Haitao Shan <maillists.shan@gmail.com> wrote:
>> I would really doubt the need to create a new interface of receiving
>> ACPI event and sending to user land (other than existing native
>> kernel) specifically for Xen. What's the benefit and why kernel
>> people should buy-in that?
>
> Receiving ACPI events??? I don't recall having seen anything like that
> in the proposed patch (since if that was the case, I would probably
> agree that this is better done in kernel/hypervisor). Was what got
> sent out perhaps just a small fraction of what is intended, and was it
> forgotten to mention this?
>
> Jan
Yes, it does need to intercept ACPI PAD notify then handle it similar way as native kernel did. The logic is at dom0 '[PATCH 3/3] Xen pad logic and notification' as attached.
Sorry I didn't give explicit and detailed explanation at the patch description (defaultly I thought you know native acpi pad, so I ignored the background story :-)
I will add more detailed description at dom0 patch 3/3.
Thanks,
Jinsong
>
>> Core parking is a platform feature, not virtualization feature.
>> Naturally following native approach is the most efficient. Why do you
>> want to create yet another interface for Xen to do that?
>>
>> Shan Haitao
>>
>> 2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Unfortunately, yes, though cumbersome is not basic reason user
>>>>>> space approach is not preferred. Core parking is a power
>>>>>> management staff, based on dynamic physical details like cpu
>>>>>> topologies and maps owned by hypervisor. It's natural to
>>>>>> implement
>>>>>
>>>>> CPU topology is available to user space, and as far as I recall
>>>>> your hypervisor patch didn't really manipulate any maps - all it
>>>>> did was pick what CPU to bring up/down, and then carry out that
>>>>> decision.
>>>>
>>>> No. threads_per_core and cores_per_socket exposed to userspace is
>>>> pointless to us (and, it's questionable need fixup).
>>>
>>> Sure this would be insufficient. But what do you think did
>>> XEN_SYSCTL_topologyinfo get added for?
>>>
>>>> Core parking depends on following physical info (no matter where
>>>> it implement):
>>>> 1. cpu_online_map;
>>>> 2. cpu_present_map;
>>>> 3. cpu_core_mask;
>>>> 4. cpu_sibling_mask;
>>>> all of them are *dynamic*, especially, 3/4 are varied per cpu and
>>>> per online/offline ops.
>>>
>>> Afaict all of these can be reconstructed using (mostly sysctl)
>>> hypercalls.
>>>
>>> Jan
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
[-- Attachment #2: 0003-Xen-pad-logic-and-notification.patch --]
[-- Type: application/octet-stream, Size: 6797 bytes --]
From fee39804d2634dfba7b369dc82dac19b57400f84 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong <jinsong.liu@intel.com>
Date: Sat, 18 Feb 2012 00:46:29 +0800
Subject: [PATCH 3/3] Xen pad logic and notification
This patch implement Xen pad logic, and when getting pad device
notification, it hypercalls to Xen hypervisor for core parking.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
---
drivers/xen/xen_acpi_pad.c | 189 ++++++++++++++++++++++++++++++++++++++
include/xen/interface/platform.h | 14 +++
2 files changed, 203 insertions(+), 0 deletions(-)
diff --git a/drivers/xen/xen_acpi_pad.c b/drivers/xen/xen_acpi_pad.c
index 63ab2fb..ba66d51 100644
--- a/drivers/xen/xen_acpi_pad.c
+++ b/drivers/xen/xen_acpi_pad.c
@@ -1,8 +1,197 @@
+/*
+ * xen_acpi_pad.c - Xen pad interface
+ *
+ * Copyright (c) 2012, Intel Corporation.
+ * Author: Liu, Jinsong <jinsong.liu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <acpi/acpi_bus.h>
+#include <acpi/acpi_drivers.h>
+
+#include <asm/xen/hypercall.h>
+
+#define ACPI_PROCESSOR_AGGREGATOR_CLASS "xen_acpi_pad"
+#define ACPI_PROCESSOR_AGGREGATOR_DEVICE_NAME "Xen Processor Aggregator"
+#define ACPI_PROCESSOR_AGGREGATOR_NOTIFY 0x80
+static DEFINE_MUTEX(xen_cpu_lock);
+
+static int xen_acpi_pad_idle_cpus(int *num_cpus)
+{
+ int ret;
+
+ struct xen_platform_op op = {
+ .cmd = XENPF_core_parking,
+ .interface_version = XENPF_INTERFACE_VERSION,
+ };
+
+ /* set cpu nums expected to be idled */
+ op.u.core_parking.type = XEN_CORE_PARKING_SET;
+ op.u.core_parking.idle_nums = (uint32_t)*num_cpus;
+ ret = HYPERVISOR_dom0_op(&op);
+ if (ret)
+ return ret;
+
+ /*
+ * get cpu nums actually be idled
+ * cannot get it by using hypercall once (shared with _SET)
+ * because of the characteristic of Xen continue_hypercall_on_cpu
+ */
+ op.u.core_parking.type = XEN_CORE_PARKING_GET;
+ ret = HYPERVISOR_dom0_op(&op);
+ if (ret)
+ return ret;
+
+ *num_cpus = op.u.core_parking.idle_nums;
+ return 0;
+}
+
+/*
+ * Query firmware how many CPUs should be idle
+ * return -1 on failure
+ */
+static int xen_acpi_pad_pur(acpi_handle handle)
+{
+ struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+ union acpi_object *package;
+ int num = -1;
+
+ if (ACPI_FAILURE(acpi_evaluate_object(handle, "_PUR", NULL, &buffer)))
+ return num;
+
+ if (!buffer.length || !buffer.pointer)
+ return num;
+
+ package = buffer.pointer;
+
+ if (package->type == ACPI_TYPE_PACKAGE &&
+ package->package.count == 2 &&
+ package->package.elements[0].integer.value == 1) /* rev 1 */
+
+ num = package->package.elements[1].integer.value;
+
+ kfree(buffer.pointer);
+ return num;
+}
+
+/* Notify firmware how many CPUs are idle */
+static void xen_acpi_pad_ost(acpi_handle handle, int stat,
+ uint32_t idle_cpus)
+{
+ union acpi_object params[3] = {
+ {.type = ACPI_TYPE_INTEGER,},
+ {.type = ACPI_TYPE_INTEGER,},
+ {.type = ACPI_TYPE_BUFFER,},
+ };
+ struct acpi_object_list arg_list = {3, params};
+
+ params[0].integer.value = ACPI_PROCESSOR_AGGREGATOR_NOTIFY;
+ params[1].integer.value = stat;
+ params[2].buffer.length = 4;
+ params[2].buffer.pointer = (void *)&idle_cpus;
+ acpi_evaluate_object(handle, "_OST", &arg_list, NULL);
+}
+
+static void xen_acpi_pad_handle_notify(acpi_handle handle)
+{
+ int ret, num_cpus;
+
+ mutex_lock(&xen_cpu_lock);
+ num_cpus = xen_acpi_pad_pur(handle);
+ if (num_cpus < 0) {
+ mutex_unlock(&xen_cpu_lock);
+ return;
+ }
+
+ ret = xen_acpi_pad_idle_cpus(&num_cpus);
+ if (ret) {
+ mutex_unlock(&xen_cpu_lock);
+ return;
+ }
+
+ xen_acpi_pad_ost(handle, 0, num_cpus);
+ mutex_unlock(&xen_cpu_lock);
+}
+
+static void xen_acpi_pad_notify(acpi_handle handle, u32 event,
+ void *data)
+{
+ switch (event) {
+ case ACPI_PROCESSOR_AGGREGATOR_NOTIFY:
+ xen_acpi_pad_handle_notify(handle);
+ break;
+ default:
+ printk(KERN_WARNING "Unsupported event [0x%x]\n", event);
+ break;
+ }
+}
+
+static int xen_acpi_pad_add(struct acpi_device *device)
+{
+ acpi_status status;
+
+ strcpy(acpi_device_name(device), ACPI_PROCESSOR_AGGREGATOR_DEVICE_NAME);
+ strcpy(acpi_device_class(device), ACPI_PROCESSOR_AGGREGATOR_CLASS);
+
+ status = acpi_install_notify_handler(device->handle,
+ ACPI_DEVICE_NOTIFY, xen_acpi_pad_notify, device);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ return 0;
+}
+
+static int xen_acpi_pad_remove(struct acpi_device *device,
+ int type)
+{
+ int num_cpus = 0;
+
+ mutex_lock(&xen_cpu_lock);
+ xen_acpi_pad_idle_cpus(&num_cpus);
+ mutex_unlock(&xen_cpu_lock);
+
+ acpi_remove_notify_handler(device->handle,
+ ACPI_DEVICE_NOTIFY, xen_acpi_pad_notify);
+ return 0;
+}
+
+static const struct acpi_device_id xen_pad_device_ids[] = {
+ {"ACPI000C", 0},
+ {"", 0},
+};
+
+static struct acpi_driver xen_acpi_pad_driver = {
+ .name = "xen_processor_aggregator",
+ .class = ACPI_PROCESSOR_AGGREGATOR_CLASS,
+ .ids = xen_pad_device_ids,
+ .ops = {
+ .add = xen_acpi_pad_add,
+ .remove = xen_acpi_pad_remove,
+ },
+};
+
int xen_acpi_pad_init(void)
{
+#ifdef CONFIG_ACPI_PROCESSOR_AGGREGATOR
+ return acpi_bus_register_driver(&xen_acpi_pad_driver);
+#else
return 0;
+#endif
}
void xen_acpi_pad_exit(void)
{
+#ifdef CONFIG_ACPI_PROCESSOR_AGGREGATOR
+ acpi_bus_unregister_driver(&xen_acpi_pad_driver);
+#endif
}
diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
index c168468..56ec72a 100644
--- a/include/xen/interface/platform.h
+++ b/include/xen/interface/platform.h
@@ -297,6 +297,19 @@ struct xenpf_set_processor_pminfo {
};
DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo);
+#define XENPF_core_parking 60
+
+#define XEN_CORE_PARKING_SET 1
+#define XEN_CORE_PARKING_GET 2
+struct xenpf_core_parking {
+ /* IN variables */
+ uint32_t type;
+ /* IN variables: set cpu nums expected to be idled */
+ /* OUT variables: get cpu nums actually be idled */
+ uint32_t idle_nums;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_core_parking);
+
struct xen_platform_op {
uint32_t cmd;
uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
@@ -312,6 +325,7 @@ struct xen_platform_op {
struct xenpf_change_freq change_freq;
struct xenpf_getidletime getidletime;
struct xenpf_set_processor_pminfo set_pminfo;
+ struct xenpf_core_parking core_parking;
uint8_t pad[128];
} u;
};
--
1.7.1
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-04 15:48 ` Liu, Jinsong
@ 2012-03-05 10:57 ` Jan Beulich
2012-03-05 15:50 ` Liu, Jinsong
0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2012-03-05 10:57 UTC (permalink / raw)
To: Haitao Shan, Jinsong Liu
Cc: Shaohua Li, keir.xen@gmail.com, KonradRzeszutekWilk, xen-devel
>>> On 04.03.12 at 16:48, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Yes, it does need to intercept ACPI PAD notify then handle it similar way as
> native kernel did. The logic is at dom0 '[PATCH 3/3] Xen pad logic and
> notification' as attached.
And I know now why I didn't look at this in any detail - it modifies a
file that doesn't even exist in 3.3-rc5 (drivers/xen/xen_acpi_pad.c),
nor does it mention what tree/branch this would be against. Plus,
ACPI_PROCESSOR_AGGREGATOR is a tristate symbol, yet your code
assumes it to be boolean.
But yes, given the way this is implemented I withdraw my reservations
against the hypervisor side changes, as doing this completely in the
Dom0 kernel would be cumbersome (due to the need to call sysctl
functions), and doing it entirely in user space would indeed only be
justified if the native kernel did it this way too.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-05 10:57 ` Jan Beulich
@ 2012-03-05 15:50 ` Liu, Jinsong
0 siblings, 0 replies; 22+ messages in thread
From: Liu, Jinsong @ 2012-03-05 15:50 UTC (permalink / raw)
To: Jan Beulich, Haitao Shan
Cc: Li, Shaohua, keir.xen@gmail.com, KonradRzeszutekWilk, xen-devel
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
Jan Beulich wrote:
>>>> On 04.03.12 at 16:48, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Yes, it does need to intercept ACPI PAD notify then handle it
>> similar way as native kernel did. The logic is at dom0 '[PATCH 3/3]
>> Xen pad logic and notification' as attached.
>
> And I know now why I didn't look at this in any detail - it modifies a
> file that doesn't even exist in 3.3-rc5 (drivers/xen/xen_acpi_pad.c),
> nor does it mention what tree/branch this would be against. Plus,
> ACPI_PROCESSOR_AGGREGATOR is a tristate symbol, yet your code
> assumes it to be boolean.
Yes, tristate/bool is an issue of core parking dom0-side patches which has involved some discussion how to solve it in a reasonable way (as attached). I will update it later.
>
> But yes, given the way this is implemented I withdraw my reservations
> against the hypervisor side changes, as doing this completely in the
> Dom0 kernel would be cumbersome (due to the need to call sysctl
> functions), and doing it entirely in user space would indeed only be
> justified if the native kernel did it this way too.
>
> Jan
Thanks Jan, Keir, and Haitao!
[-- Attachment #2: Type: message/rfc822, Size: 8521 bytes --]
From: "Liu, Jinsong" <jinsong.liu@intel.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Brown, Len" <len.brown@intel.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, "keir.xen@gmail.com" <keir.xen@gmail.com>, Jan Beulich <jbeulich@suse.com>, "Li, Shaohua" <shaohua.li@intel.com>, "lenb@kernel.org" <lenb@kernel.org>
Subject: Re: [Xen-devel] [PATCH 1/2] RFC: Prepare PAD for native and xen platform
Date: Wed, 29 Feb 2012 08:01:54 +0000
Message-ID: <DE8DF0795D48FD4CA783C40EC82923350BAC09@SHSMSX101.ccr.corp.intel.com>
Liu, Jinsong wrote:
> Konrad Rzeszutek Wilk wrote:
>> On Sun, Feb 26, 2012 at 08:25:41AM +0000, Liu, Jinsong wrote:
>>> Liu, Jinsong wrote:
>>>> Jan Beulich wrote:
>>>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 02/23/12 2:29 PM >>>
>>>>>> --- a/drivers/acpi/Kconfig
>>>>>> +++ b/drivers/acpi/Kconfig
>>>>>> @@ -213,10 +213,11 @@ config ACPI_HOTPLUG_CPU
>>>>>> default y
>>>>> >
>>>>>> config ACPI_PROCESSOR_AGGREGATOR
>>>>>> - tristate "Processor Aggregator"
>>>>>> + bool "Processor Aggregator"
>>>>>
>>>>> There must be ways to address whatever strange problem you see
>>>>> without making this piece of code non-modular.
>>>>>
>>>>
>>>> Yes, another approach is x86_init approach, defining acpi_pad_ops
>>>> at x86_init.c and overwritten when xen_start_kernel. This patch is
>>>> just a RFC patch, to evaluate which approch is more reasonable :-)
>>>>
>>>
>>> Have a more think about it, x86_init approach still need to disable
>>> acpi_pad module. Seems we have to set acpi_pad as bool, as long as
>>> it need to hook to native acpi_pad fucs/variables.
>>
>> What about the other approach I suggested where there are some
>> function overrides in osl.c? Something similar to
>> https://lkml.org/lkml/2012/1/17/401, specifically
>> https://lkml.org/lkml/2012/1/17/403 - that way you are not turning
>> the modules into being built in, but intead have the function table
>> already in the kernel (as some form of EXPORT_SYMBOL_GPL or a
>> registration function).
>>
>
> Thanks for the example (it's good itself :-), but per my
> understanding they are different cases.
>
> In the osl example case, tboot_late_init call
> acpi_os_set_prepare_sleep to register func, so it works no matter
> tboot.c built as y/n/m (through currently it's bool).
>
> However, in our case, we just cannot do so. We need
> 1. predefine a hook to native acpi pad funcs, take effect when static
> compile stage;
> 2. when xen init, redirect the hook to xen acpi pad funcs, take
> effect at running time; (The point is, we cannot do init twice for
> native and xen acpi pad, so we have to statically predefine a hook to
> native acpi_pad)
>
> For the reason above, I think we have to remove acpi_pad module, as
> long as we need to hook to native acpi_pad funcs. Thoughts?
>
Compare approaches:
1. xen overwritten approach (patches V2, x86_init, osl approach)
Pros:
a little simpler code
Cons:
1). specific to xen, cannot extend to other virt platform;
2). need to change natvie acpi_pad as modular;
2. paravirt interface approach (original patches V1)
Pros:
1). standard hypervisor-agnostic interface (USENIX conference 2006), can easily hook to Xen/lguest/... on demand;
2). arch independent;
3). no need to change native acpi_pad as non-modular;
Cons:
a little complicated code, and code patching is some overkilled for this case (but no harm);
(BTW, in the future we need add more and more pv ops, like pv_pm_ops, pv_cpu_hotplug_ops, pv_mem_hotplug_ops, etc. So how about add a pv_misc_ops template to handle them all? seems it's a common issue).
Your opinion?
Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-02 11:00 ` Keir Fraser
@ 2012-03-05 21:27 ` Haitao Shan
2012-03-05 21:50 ` Keir Fraser
0 siblings, 1 reply; 22+ messages in thread
From: Haitao Shan @ 2012-03-05 21:27 UTC (permalink / raw)
To: Keir Fraser
Cc: Liu, Jinsong, Konrad Rzeszutek Wilk, xen-devel,
keir.xen@gmail.com, Jan Beulich, Shaohua Li
Thanks, Keir. We did create new hypercalls.
But for new interface(mentioned in my previous mail), I mean the
mechanisms for kernel to notify user-space for core parking decision.
This does *not* exist in kernel. If we add it specifically for Xen, I
don't think kernel people would buy-in that.
Shan Haitao
2012/3/2 Keir Fraser <keir@xen.org>:
> On 02/03/2012 09:42, "Haitao Shan" <maillists.shan@gmail.com> wrote:
>
>> I would really doubt the need to create a new interface of receiving
>> ACPI event and sending to user land (other than existing native
>> kernel) specifically for Xen. What's the benefit and why kernel people
>> should buy-in that?
>> Core parking is a platform feature, not virtualization feature.
>> Naturally following native approach is the most efficient. Why do you
>> want to create yet another interface for Xen to do that?
>
> While I sympathise with your position rather more than Jan does, the fact is
> that it's *you* who are suggesting yet-another-Xen-interface. Whereas doing
> it in userspace requires only existing hypercalls I believe.
>
> -- Keir
>
>> Shan Haitao
>>
>> 2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>>>>> approach is not preferred. Core parking is a power management staff,
>>>>>> based on dynamic physical details like cpu topologies and maps owned
>>>>>> by hypervisor. It's natural to implement
>>>>>
>>>>> CPU topology is available to user space, and as far as I recall your
>>>>> hypervisor patch didn't really manipulate any maps - all it did was
>>>>> pick what CPU to bring up/down, and then carry out that decision.
>>>>
>>>> No. threads_per_core and cores_per_socket exposed to userspace is pointless
>>>> to us (and, it's questionable need fixup).
>>>
>>> Sure this would be insufficient. But what do you think did
>>> XEN_SYSCTL_topologyinfo get added for?
>>>
>>>> Core parking depends on following physical info (no matter where it
>>>> implement):
>>>> 1. cpu_online_map;
>>>> 2. cpu_present_map;
>>>> 3. cpu_core_mask;
>>>> 4. cpu_sibling_mask;
>>>> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
>>>> online/offline ops.
>>>
>>> Afaict all of these can be reconstructed using (mostly sysctl)
>>> hypercalls.
>>>
>>> Jan
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Core parking feature enable
2012-03-05 21:27 ` Haitao Shan
@ 2012-03-05 21:50 ` Keir Fraser
0 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2012-03-05 21:50 UTC (permalink / raw)
To: Haitao Shan, Keir Fraser
Cc: Liu, Jinsong, Shaohua Li, xen-devel, Jan Beulich,
Konrad Rzeszutek Wilk
Sorry, yes, I also had missed the ACPI interaction.
On 05/03/2012 21:27, "Haitao Shan" <maillists.shan@gmail.com> wrote:
> Thanks, Keir. We did create new hypercalls.
> But for new interface(mentioned in my previous mail), I mean the
> mechanisms for kernel to notify user-space for core parking decision.
> This does *not* exist in kernel. If we add it specifically for Xen, I
> don't think kernel people would buy-in that.
>
> Shan Haitao
>
> 2012/3/2 Keir Fraser <keir@xen.org>:
>> On 02/03/2012 09:42, "Haitao Shan" <maillists.shan@gmail.com> wrote:
>>
>>> I would really doubt the need to create a new interface of receiving
>>> ACPI event and sending to user land (other than existing native
>>> kernel) specifically for Xen. What's the benefit and why kernel people
>>> should buy-in that?
>>> Core parking is a platform feature, not virtualization feature.
>>> Naturally following native approach is the most efficient. Why do you
>>> want to create yet another interface for Xen to do that?
>>
>> While I sympathise with your position rather more than Jan does, the fact is
>> that it's *you* who are suggesting yet-another-Xen-interface. Whereas doing
>> it in userspace requires only existing hypercalls I believe.
>>
>> -- Keir
>>
>>> Shan Haitao
>>>
>>> 2012/3/1 Jan Beulich <JBeulich@suse.com>:
>>>>>>> On 01.03.12 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 01.03.12 at 12:14, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>>>>> Unfortunately, yes, though cumbersome is not basic reason user space
>>>>>>> approach is not preferred. Core parking is a power management staff,
>>>>>>> based on dynamic physical details like cpu topologies and maps owned
>>>>>>> by hypervisor. It's natural to implement
>>>>>>
>>>>>> CPU topology is available to user space, and as far as I recall your
>>>>>> hypervisor patch didn't really manipulate any maps - all it did was
>>>>>> pick what CPU to bring up/down, and then carry out that decision.
>>>>>
>>>>> No. threads_per_core and cores_per_socket exposed to userspace is
>>>>> pointless
>>>>> to us (and, it's questionable need fixup).
>>>>
>>>> Sure this would be insufficient. But what do you think did
>>>> XEN_SYSCTL_topologyinfo get added for?
>>>>
>>>>> Core parking depends on following physical info (no matter where it
>>>>> implement):
>>>>> 1. cpu_online_map;
>>>>> 2. cpu_present_map;
>>>>> 3. cpu_core_mask;
>>>>> 4. cpu_sibling_mask;
>>>>> all of them are *dynamic*, especially, 3/4 are varied per cpu and per
>>>>> online/offline ops.
>>>>
>>>> Afaict all of these can be reconstructed using (mostly sysctl)
>>>> hypercalls.
>>>>
>>>> Jan
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-03-05 21:50 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-17 8:54 Core parking feature enable Liu, Jinsong
2012-02-17 9:41 ` Jan Beulich
2012-02-17 17:48 ` Liu, Jinsong
2012-02-21 8:03 ` Jan Beulich
2012-02-22 3:19 ` Liu, Jinsong
[not found] ` <DE8DF0795D48FD4CA783C40EC82923350A7F35@SHSMSX101.ccr.corp.intel.com>
2012-02-29 12:41 ` Liu, Jinsong
2012-02-29 12:47 ` Liu, Jinsong
2012-02-29 13:47 ` Jan Beulich
2012-03-01 8:20 ` Liu, Jinsong
2012-03-01 8:50 ` Jan Beulich
2012-03-01 11:14 ` Liu, Jinsong
2012-03-01 11:21 ` Jan Beulich
2012-03-01 14:31 ` Liu, Jinsong
2012-03-01 15:11 ` Jan Beulich
2012-03-02 9:42 ` Haitao Shan
2012-03-02 11:00 ` Keir Fraser
2012-03-05 21:27 ` Haitao Shan
2012-03-05 21:50 ` Keir Fraser
2012-03-02 11:46 ` Jan Beulich
2012-03-04 15:48 ` Liu, Jinsong
2012-03-05 10:57 ` Jan Beulich
2012-03-05 15:50 ` Liu, Jinsong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).