* Re: [RFC PATCH V4 4/5] cpuidle: driver for xen [not found] ` <20110322123324.28725.3131.stgit@tringupt.in.ibm.com> @ 2011-03-22 14:50 ` Konrad Rzeszutek Wilk 2011-03-23 9:57 ` Trinabh Gupta 0 siblings, 1 reply; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-03-22 14:50 UTC (permalink / raw) To: Trinabh Gupta Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, lenb On Tue, Mar 22, 2011 at 06:03:28PM +0530, Trinabh Gupta wrote: > This patch implements a default cpuidle driver for xen. > Earlier pm_idle was flipped to default_idle. Maybe there > is a better way to ensure default_idle is called > without using this cpuidle driver. Please also CC the Xen devel mailing list (I did this for you) I couldn't find it in the description, but I was wondering what is that you are trying to fix? What is the problem description? Two, you mention in your description that it was tested on x86 systems. did you test this with Xen? If so, what version. > > Signed-off-by: Trinabh Gupta <trinabh@linux.vnet.ibm.com> > --- > > arch/x86/xen/setup.c | 42 +++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 41 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index a8a66a5..4fce4cd 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -9,6 +9,8 @@ > #include <linux/mm.h> > #include <linux/pm.h> > #include <linux/memblock.h> > +#include <linux/cpuidle.h> > +#include <linux/slab.h> > > #include <asm/elf.h> > #include <asm/vdso.h> > @@ -321,6 +323,44 @@ void __cpuinit xen_enable_syscall(void) > #endif /* CONFIG_X86_64 */ > } > > +static struct cpuidle_driver xen_idle_driver = { > + .name = "xen_idle", > + .owner = THIS_MODULE, > + .priority = 1, > +}; > + > +extern struct cpuidle_state state_default_state; > + > +static int setup_cpuidle(int cpu) > +{ > + struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device), > + GFP_KERNEL); No checking to see if dev == NULL? > + int count = CPUIDLE_DRIVER_STATE_START; > + dev->cpu = cpu; > + dev->drv = &xen_idle_driver; > + > + dev->states[count] = state_default_idle; > + count++; > + > + dev->state_count = count; > + > + if (cpuidle_register_device(dev)) > + return -EIO; No cleanup of the 'dev' so that we don't leak memory? > + return 0; > +} > + > +static int xen_idle_init(void) > +{ > + int retval, i; > + retval = cpuidle_register_driver(&xen_idle_driver); > + > + for_each_online_cpu(i) { > + setup_cpuidle(i); > + } > + > + return 0; > +} > + > void __init xen_arch_setup(void) > { > xen_panic_handler_init(); > @@ -354,7 +394,7 @@ void __init xen_arch_setup(void) > #ifdef CONFIG_X86_32 > boot_cpu_data.hlt_works_ok = 1; > #endif > - pm_idle = default_idle; > + xen_idle_init(); > boot_option_idle_override = IDLE_HALT; > > fiddle_vdso(); > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-22 14:50 ` [RFC PATCH V4 4/5] cpuidle: driver for xen Konrad Rzeszutek Wilk @ 2011-03-23 9:57 ` Trinabh Gupta 2011-03-24 7:18 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Trinabh Gupta @ 2011-03-23 9:57 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: arjan, peterz, lenb, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel On 03/22/2011 08:20 PM, Konrad Rzeszutek Wilk wrote: > On Tue, Mar 22, 2011 at 06:03:28PM +0530, Trinabh Gupta wrote: >> This patch implements a default cpuidle driver for xen. >> Earlier pm_idle was flipped to default_idle. Maybe there >> is a better way to ensure default_idle is called >> without using this cpuidle driver. > Hi Konrad, > Please also CC the Xen devel mailing list (I did this for you) Yes, I will. Thanks > > I couldn't find it in the description, but I was wondering > what is that you are trying to fix? What is the problem description? We are trying to remove the exported function pointer pm_idle which is set to the desired idle routine to be used. For example, xen sets it to default_idle. Having exported function pointer is not very secure. The first problem is that this exported pointer can be modified/flipped by any subsystem. There is no tracking or notification mechanism. Secondly and more importantly, various subsystems save the value of this pointer, flip it and later restore to the saved value. There is no guarantee that the saved value is still valid (see http://lkml.org/lkml/2009/8/28/43 and http://lkml.org/lkml/2009/8/28/50) The first patch of the series removed pm_idle and now we directly call into CPUIdle subsystem. As a result for all the subsystems using pm_idle, we have to implement a driver that can be registered to cpuidle. > > Two, you mention in your description that it was tested on x86 systems. did > you test this with Xen? If so, what version. The patches are still in RFC stage and haven't been tested with Xen. Once we are clear on a particular solution, we will carefully test the patches. Thanks for the code review. Yes, I have missed a couple of things. I will look at how to maintain per cpu dev pointers and free the memory. Thanks, -Trinabh > >> >> Signed-off-by: Trinabh Gupta<trinabh@linux.vnet.ibm.com> >> --- >> >> arch/x86/xen/setup.c | 42 +++++++++++++++++++++++++++++++++++++++++- >> 1 files changed, 41 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c >> index a8a66a5..4fce4cd 100644 >> --- a/arch/x86/xen/setup.c >> +++ b/arch/x86/xen/setup.c >> @@ -9,6 +9,8 @@ >> #include<linux/mm.h> >> #include<linux/pm.h> >> #include<linux/memblock.h> >> +#include<linux/cpuidle.h> >> +#include<linux/slab.h> >> >> #include<asm/elf.h> >> #include<asm/vdso.h> >> @@ -321,6 +323,44 @@ void __cpuinit xen_enable_syscall(void) >> #endif /* CONFIG_X86_64 */ >> } >> >> +static struct cpuidle_driver xen_idle_driver = { >> + .name = "xen_idle", >> + .owner = THIS_MODULE, >> + .priority = 1, >> +}; >> + >> +extern struct cpuidle_state state_default_state; >> + >> +static int setup_cpuidle(int cpu) >> +{ >> + struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device), >> + GFP_KERNEL); > > No checking to see if dev == NULL? >> + int count = CPUIDLE_DRIVER_STATE_START; >> + dev->cpu = cpu; >> + dev->drv =&xen_idle_driver; >> + >> + dev->states[count] = state_default_idle; >> + count++; >> + >> + dev->state_count = count; >> + >> + if (cpuidle_register_device(dev)) >> + return -EIO; > No cleanup of the 'dev' so that we don't leak memory? > >> + return 0; >> +} >> + >> +static int xen_idle_init(void) >> +{ >> + int retval, i; >> + retval = cpuidle_register_driver(&xen_idle_driver); >> + >> + for_each_online_cpu(i) { >> + setup_cpuidle(i); >> + } >> + >> + return 0; >> +} >> + >> void __init xen_arch_setup(void) >> { >> xen_panic_handler_init(); >> @@ -354,7 +394,7 @@ void __init xen_arch_setup(void) >> #ifdef CONFIG_X86_32 >> boot_cpu_data.hlt_works_ok = 1; >> #endif >> - pm_idle = default_idle; >> + xen_idle_init(); >> boot_option_idle_override = IDLE_HALT; >> >> fiddle_vdso(); >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-23 9:57 ` Trinabh Gupta @ 2011-03-24 7:18 ` Len Brown 2011-03-24 12:05 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-24 7:18 UTC (permalink / raw) To: Trinabh Gupta Cc: Konrad Rzeszutek Wilk, arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel Is a CONFIG_XEN kernel supposed to use just HLT in idle? xen_arch_setup() does this: pm_idle = default_idle; boot_option_idle_override = IDLE_HALT; which has that effect. I guess this makes sense b/c the CONFIG_XEN kernel is Dom0 and the real C-sates are done by the hypervisor? Would the same CONFIG_XEN kernel binary ever not run xen_arch_setup(), run on raw hardware, and want to use idle states other than HLT? thanks, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-24 7:18 ` Len Brown @ 2011-03-24 12:05 ` Konrad Rzeszutek Wilk 2011-03-25 7:19 ` Len Brown 2011-03-25 14:38 ` [Xen-devel] " Jeremy Fitzhardinge 0 siblings, 2 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-03-24 12:05 UTC (permalink / raw) To: Len Brown Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, Trinabh Gupta On Thu, Mar 24, 2011 at 03:18:14AM -0400, Len Brown wrote: > Is a CONFIG_XEN kernel supposed to use just HLT in idle? For right now.. > > xen_arch_setup() does this: > > pm_idle = default_idle; > boot_option_idle_override = IDLE_HALT; > > which has that effect. I guess this makes sense b/c the > CONFIG_XEN kernel is Dom0 and the real C-sates are done > by the hypervisor? Correct. There are some patches that make the C-states be visible in the Linux kernel, but that hasn't been ported over yet. > > Would the same CONFIG_XEN kernel binary ever not > run xen_arch_setup(), run on raw hardware, and want ever not? I am not sure of the question, so let me state: The Linux kernel if compiled with CONFIG_XEN, and if run on native hardware, would _never_ run 'xen_arch_setup()'*. It would run the normal, native type setup. > to use idle states other than HLT? > *: It could if you really really wanted. You would need to change the GRUB2 to inject some extra data in the 'sub_hardware' flag to be the Xen specific. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-24 12:05 ` Konrad Rzeszutek Wilk @ 2011-03-25 7:19 ` Len Brown 2011-03-25 14:43 ` Jeremy Fitzhardinge 2011-03-25 14:38 ` [Xen-devel] " Jeremy Fitzhardinge 1 sibling, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-25 7:19 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Trinabh Gupta, arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel > On Thu, Mar 24, 2011 at 03:18:14AM -0400, Len Brown wrote: > > Is a CONFIG_XEN kernel supposed to use just HLT in idle? > > For right now.. > > > > xen_arch_setup() does this: > > > > pm_idle = default_idle; > > boot_option_idle_override = IDLE_HALT; > > > > which has that effect. I guess this makes sense b/c the > > CONFIG_XEN kernel is Dom0 and the real C-sates are done > > by the hypervisor? > > Correct. There are some patches that make the C-states > be visible in the Linux kernel, but that hasn't been ported > over yet. The Xen Dom0 kernel will trap into the hypervisor whenever it does a HLT or an MWAIT, yes? What is the benefit of having Dom0 decided between C-states that it can't actually enter? What is the mechanism by which those C-states are made visible to Dom0, and how are those states related to the states that are supported on the bare iron? > > Would the same CONFIG_XEN kernel binary ever not > > run xen_arch_setup(), run on raw hardware, and want > > to use idle states other than HLT? > ever not? I am not sure of the question, so let me state: > The Linux kernel if compiled with CONFIG_XEN, and if run on > native hardware, would _never_ run 'xen_arch_setup()'*. It would > run the normal, native type setup. Thanks, that clarifies. -Len Brown, Intel Open Source Technolgy Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-25 7:19 ` Len Brown @ 2011-03-25 14:43 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 40+ messages in thread From: Jeremy Fitzhardinge @ 2011-03-25 14:43 UTC (permalink / raw) To: Len Brown Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, Konrad Rzeszutek Wilk, linux-kernel, xen-devel, arjan, Trinabh Gupta On 03/25/2011 07:19 AM, Len Brown wrote: > The Xen Dom0 kernel will trap into the hypervisor > whenever it does a HLT or an MWAIT, yes? Yes, on hlt. > What is the benefit of having Dom0 decided between > C-states that it can't actually enter? There might be a slight benefit to allow a domain to tell Xen what its overall utilisation is (ie, "I'd like this VCPU to run, but it isn't very important so you can take that into account when choosing scheduling priority and/or PCPU performance"). But there's nothing like that at present. > What is the mechanism by which those C-states are > made visible to Dom0, and how are those states > related to the states that are supported on > the bare iron? Because dom0 is the official ACPI owner (ie, it has the AML interpreter), we need dom0 to handle complex interactions with ACPI (the hypervisor can do simple table parsing). At present the mechanism for power states is that dom0 gets them out of ACPI, and then passes them to Xen which actually uses them. But no guest kernel has any runtime use of power states. J ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-24 12:05 ` Konrad Rzeszutek Wilk 2011-03-25 7:19 ` Len Brown @ 2011-03-25 14:38 ` Jeremy Fitzhardinge 2011-03-31 2:02 ` Len Brown 1 sibling, 1 reply; 40+ messages in thread From: Jeremy Fitzhardinge @ 2011-03-25 14:38 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Len Brown, venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, Trinabh Gupta On 03/24/2011 12:05 PM, Konrad Rzeszutek Wilk wrote: > On Thu, Mar 24, 2011 at 03:18:14AM -0400, Len Brown wrote: >> Is a CONFIG_XEN kernel supposed to use just HLT in idle? > For right now.. For always, I should think. >> xen_arch_setup() does this: >> >> pm_idle = default_idle; >> boot_option_idle_override = IDLE_HALT; >> >> which has that effect. I guess this makes sense b/c the >> CONFIG_XEN kernel is Dom0 and the real C-sates are done >> by the hypervisor? > Correct. There are some patches that make the C-states > be visible in the Linux kernel, but that hasn't been ported > over yet. All we need is for the idle CPU to block in the hypervisor; a plain "hlt" is always going to be sufficient (which is overridden as a pvop into a sched_idle hypercall). Xen will choose an appropriate power state for the physical cpus depending on the overall busyness of the system (which any individual virtual machine can't determine). J ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-25 14:38 ` [Xen-devel] " Jeremy Fitzhardinge @ 2011-03-31 2:02 ` Len Brown 2011-03-31 21:26 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-31 2:02 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, Konrad Rzeszutek Wilk, linux-kernel, xen-devel, arjan, Trinabh Gupta > >> Is a CONFIG_XEN kernel supposed to use just HLT in idle? > > For right now.. > > For always, I should think. Yay! > >> xen_arch_setup() does this: > >> > >> pm_idle = default_idle; > >> boot_option_idle_override = IDLE_HALT; > >> > >> which has that effect. I guess this makes sense b/c the > >> CONFIG_XEN kernel is Dom0 and the real C-sates are done > >> by the hypervisor? > > Correct. There are some patches that make the C-states > > be visible in the Linux kernel, but that hasn't been ported > > over yet. > > All we need is for the idle CPU to block in the hypervisor; a plain > "hlt" is always going to be sufficient (which is overridden as a pvop > into a sched_idle hypercall). > > Xen will choose an appropriate power state for the physical cpus > depending on the overall busyness of the system (which any individual > virtual machine can't determine). Okay, knowing that the Dom0 kernel 1. can boot in non-xen mode on bare hardware and run cpuidle 2. needs just HLT when booted in xen mode will help us keep things simple. thanks, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-31 2:02 ` Len Brown @ 2011-03-31 21:26 ` Len Brown 2011-03-31 22:36 ` [Xen-devel] " Jeremy Fitzhardinge 0 siblings, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-31 21:26 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, Konrad Rzeszutek Wilk, linux-kernel, xen-devel, arjan, Trinabh Gupta > > >> xen_arch_setup() does this: > > >> > > >> pm_idle = default_idle; > > >> boot_option_idle_override = IDLE_HALT; What happens on a Xen kernel if these lines are not there? Does Xen export the C-states tables to Dom0 kernel, and the Dom0 kernel has an acpi processor driver, and thus it would try to use all the C-states? If yes, must Xen show those tables to Dom? If it did not, then the lines above would not be necessary, as in the absence of any C-states, the kernel should use halt by default. thanks, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-31 21:26 ` Len Brown @ 2011-03-31 22:36 ` Jeremy Fitzhardinge 2011-04-01 3:03 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Jeremy Fitzhardinge @ 2011-03-31 22:36 UTC (permalink / raw) To: Len Brown Cc: Konrad Rzeszutek Wilk, venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, Trinabh Gupta On 03/31/2011 02:26 PM, Len Brown wrote: >>>>> xen_arch_setup() does this: >>>>> >>>>> pm_idle = default_idle; >>>>> boot_option_idle_override = IDLE_HALT; > What happens on a Xen kernel if these lines are not there? > Does Xen export the C-states tables to Dom0 kernel, and the Dom0 > kernel has an acpi processor driver, and thus it would try to > use all the C-states? If they're no there it tries to use the Intel cpuidle driver, which fails (just hangs forever in idle, I think). > If yes, must Xen show those tables to Dom? > If it did not, then the lines above would not be necessary, > as in the absence of any C-states, the kernel should > use halt by default. The dom0 kernel gets all the ACPI state so it can get all the juicy goodness from it. It does extract the C state info, but passes it back to Xen rather than use it itself. We don't generally try to filter the ACPI state before letting dom0 see it (DMAR being the exception, since dom0 really has no business knowing about that). (We have this basic problem that neither Xen nor dom0 are the ideal owners of ACPI. In principle Xen should own ACPI as the most privileged "OS", but it really only cares about things like power states, interrupt routing, system topology, busses, etc. But dom0 cares about lid switches, magic keyboard keys, volume controls, video output switching, etc, etc. At the moment it seems to work best if dom0 do all ACPI processing then pass Xen the parts it needs, which are generally fixed-at-startup config info items.) J ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen 2011-03-31 22:36 ` [Xen-devel] " Jeremy Fitzhardinge @ 2011-04-01 3:03 ` Len Brown 0 siblings, 0 replies; 40+ messages in thread From: Len Brown @ 2011-04-01 3:03 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, Konrad Rzeszutek Wilk, linux-kernel, xen-devel, arjan, Trinabh Gupta > >>>>> xen_arch_setup() does this: > >>>>> > >>>>> pm_idle = default_idle; > >>>>> boot_option_idle_override = IDLE_HALT; > > What happens on a Xen kernel if these lines are not there? > > Does Xen export the C-states tables to Dom0 kernel, and the Dom0 > > kernel has an acpi processor driver, and thus it would try to > > use all the C-states? > > If they're no there it tries to use the Intel cpuidle driver, which > fails (just hangs forever in idle, I think). > > > If yes, must Xen show those tables to Dom? > > If it did not, then the lines above would not be necessary, > > as in the absence of any C-states, the kernel should > > use halt by default. > > The dom0 kernel gets all the ACPI state so it can get all the juicy > goodness from it. It does extract the C state info, but passes it back > to Xen rather than use it itself. We don't generally try to filter the > ACPI state before letting dom0 see it (DMAR being the exception, since > dom0 really has no business knowing about that). > > (We have this basic problem that neither Xen nor dom0 are the ideal > owners of ACPI. In principle Xen should own ACPI as the most privileged > "OS", but it really only cares about things like power states, interrupt > routing, system topology, busses, etc. But dom0 cares about lid > switches, magic keyboard keys, volume controls, video output switching, > etc, etc. At the moment it seems to work best if dom0 do all ACPI > processing then pass Xen the parts it needs, which are generally > fixed-at-startup config info items.) Okay. Since a Xen kernel can also boot on bare iron, and thus includes cpuidle, acpi_idle, intel_idle; and when in Dom0 mode it simply wants to run HLT in idle... I think what we want to do is simply disable cpuidle when booted in Dom0 mode. That will allow it to fall back to default_idle w/o xen_setup needing to tinker with installing an idle driver. I'll send a patch in the next series. If you can try it out, that would be great. thanks, -Len ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20110322123233.28725.92874.stgit@tringupt.in.ibm.com>]
[parent not found: <alpine.LFD.2.02.1103222254420.10549@x980>]
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection [not found] ` <alpine.LFD.2.02.1103222254420.10549@x980> @ 2011-03-23 9:22 ` Trinabh Gupta 2011-03-23 20:51 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Trinabh Gupta @ 2011-03-23 9:22 UTC (permalink / raw) To: Len Brown Cc: arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel Hi Len, The goal of the patch series is to remove exported pm_idle function pointer (see http://lkml.org/lkml/2009/8/28/43 and http://lkml.org/lkml/2009/8/28/50 for problems related to pm_idle). The first patch in the series removes pm_idle for x86 and we now directly call cpuidle_idle_call as suggested by Arjan (https://lkml.org/lkml/2010/10/19/453). But we also have to replace the functionality provided by pm_idle, i.e. call default_idle for platforms where no better idle routine exists, call mwait for pre-nehalem platforms, use intel_idle or acpi_idle for nehalem architectures etc. To manage all this we need a registration mechanism which is conveniently provided by cpuidle. In theory I agree that we can maybe do without list based registration i.e probe and pick the best for the platform, but things may become less predictable and difficult to manage as we have more and more platforms and drivers. By directly calling into cpuidle, we already have arch default other than intel_idle and acpi_idle. Then APM and xen (though it uses default_idle) also have their own idle routines. List based management and selection based on priority would provide a cleaner solution. Thanks, -Trinabh On 03/23/2011 08:29 AM, Len Brown wrote: > the original cpuidle prototype supported multiple driver registration, > but no production use for it could be imagined, and so it was deleted. > > Subsequently on x86, we added intel_idle to replace acpi_idle > and a typical kernel will have them both built in. > We still don't allow mutliple registrations, we just arrange > affairs such that the preferred intel_idle probes before > the backup, acpi_idle. If intel_idle recognizes the platform, > its probe succeeds, else acpi_idle gets a go. > If there is a problem with intel_idle, or a comparison needs to be made, > a bootparam is available to tell intel_idle not to probe. > > This mechanism takes approximately 10 lines of code -- the bootparam > to disable the preferred driver. > > What is the benefit of all the code to support the feature of run-time > multiple driver registration and switching? > > thanks, > Len Brown, Intel Open Source Technology Center > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-23 9:22 ` [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection Trinabh Gupta @ 2011-03-23 20:51 ` Len Brown 2011-03-24 4:41 ` Len Brown 2011-03-24 14:13 ` Trinabh Gupta 0 siblings, 2 replies; 40+ messages in thread From: Len Brown @ 2011-03-23 20:51 UTC (permalink / raw) To: Trinabh Gupta Cc: arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel > The goal of the patch series is to remove exported pm_idle function > pointer (see http://lkml.org/lkml/2009/8/28/43 and > http://lkml.org/lkml/2009/8/28/50 for problems related to pm_idle). > The first patch in the series removes pm_idle for x86 and we > now directly call cpuidle_idle_call as suggested by Arjan > (https://lkml.org/lkml/2010/10/19/453). So the problem statement with "pm_idle" is that it is visible to modules and thus potentially racey and unsafe? Any reason we can't delete his line today to address most of the concern? EXPORT_SYMBOL(pm_idle); > But we also have to replace the functionality provided by pm_idle, > i.e. call default_idle for platforms where no better idle routine > exists, call mwait for pre-nehalem platforms, use intel_idle or > acpi_idle for nehalem architectures etc. To manage all this > we need a registration mechanism which is conveniently provided > by cpuidle. It isn't immediately clear to me that all of these options need to be preserved. Are we suggesting that x86 must always build with cpuidle? I'm sure that somebody someplace will object to that. OTOH, if cpuidle is included, I'd like to see the non-cpuidle code excluded, since nobody will run it... > In theory I agree that we can maybe do without list based > registration i.e probe and pick the best for the platform, but things > may become less predictable and difficult to manage as > we have more and more platforms and drivers. > By directly calling into cpuidle, we already have arch default > other than intel_idle and acpi_idle. Then APM and xen (though > it uses default_idle) also have their own idle routines. > List based management and selection based on priority would provide Does anybody actually use the latest kernel in APM mode? I'm not even sure the last version of Windows that would talk to APM, it was whatever was before Windows-95, I think. But don't get me wrong, I agree that pm_idle should go. I agree that cpuidle should have a default other than the polling loop it currently uses. I just don't think we should spend a lot of code and time preserving every conceivable option and feature. We should first do some spring cleaning to see if we can simplify the problem. thanks, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-23 20:51 ` Len Brown @ 2011-03-24 4:41 ` Len Brown 2011-03-24 14:13 ` Trinabh Gupta 1 sibling, 0 replies; 40+ messages in thread From: Len Brown @ 2011-03-24 4:41 UTC (permalink / raw) To: Trinabh Gupta Cc: arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel > Does anybody actually use the latest kernel in APM mode? Google tells me that Windows 2000 still supported APM, and it was still present in Windows XP APM was not present in Windows Vista, aka Windows 2006. MS dropped support in their latest OS 5 years ago. When will the latest Linux drop APM support? -Len ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-23 20:51 ` Len Brown 2011-03-24 4:41 ` Len Brown @ 2011-03-24 14:13 ` Trinabh Gupta 2011-03-24 16:52 ` Vaidyanathan Srinivasan 2011-03-25 7:05 ` Len Brown 1 sibling, 2 replies; 40+ messages in thread From: Trinabh Gupta @ 2011-03-24 14:13 UTC (permalink / raw) To: Len Brown, arjan, peterz Cc: suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel On 03/24/2011 02:21 AM, Len Brown wrote: >> The goal of the patch series is to remove exported pm_idle function >> pointer (see http://lkml.org/lkml/2009/8/28/43 and >> http://lkml.org/lkml/2009/8/28/50 for problems related to pm_idle). >> The first patch in the series removes pm_idle for x86 and we >> now directly call cpuidle_idle_call as suggested by Arjan >> (https://lkml.org/lkml/2010/10/19/453). > > So the problem statement with "pm_idle" is that it is visible to modules > and thus potentially racey and unsafe? > > Any reason we can't delete his line today to address most of the concern? Hi Len, I think there are other problems too, related to saving and restoring of pm_idle pointer. For example, cpuidle itself saves current value of pm_idle, flips it and then restores the saved value. There is no guarantee that the saved function still exists. APM does exact same thing (though it may not be used these days). The problem also is that a number of architectures have copied the same design based on pm_idle; so its spreading. > > EXPORT_SYMBOL(pm_idle); > >> But we also have to replace the functionality provided by pm_idle, >> i.e. call default_idle for platforms where no better idle routine >> exists, call mwait for pre-nehalem platforms, use intel_idle or >> acpi_idle for nehalem architectures etc. To manage all this >> we need a registration mechanism which is conveniently provided >> by cpuidle. > > It isn't immediately clear to me that all of these options > need to be preserved. So what do you suggest can be removed? > > Are we suggesting that x86 must always build with cpuidle? > I'm sure that somebody someplace will object to that. Arjan argued that since almost everyone today runs cpuidle it may be best to include it in the kernel (https://lkml.org/lkml/2010/10/20/243). But yes, we agreed that we would have to make cpuidle lighter incrementally. Making ladder governor optional could be one way for example. > > OTOH, if cpuidle is included, I'd like to see the > non-cpuidle code excluded, since nobody will run it... > >> In theory I agree that we can maybe do without list based >> registration i.e probe and pick the best for the platform, but things >> may become less predictable and difficult to manage as >> we have more and more platforms and drivers. >> By directly calling into cpuidle, we already have arch default >> other than intel_idle and acpi_idle. Then APM and xen (though >> it uses default_idle) also have their own idle routines. >> List based management and selection based on priority would provide > > Does anybody actually use the latest kernel in APM mode? > I'm not even sure the last version of Windows that would talk to APM, > it was whatever was before Windows-95, I think. > > But don't get me wrong, I agree that pm_idle should go. > I agree that cpuidle should have a default other than > the polling loop it currently uses. Sure, thanks -Trinabh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-24 14:13 ` Trinabh Gupta @ 2011-03-24 16:52 ` Vaidyanathan Srinivasan 2011-03-25 7:13 ` Len Brown 2011-03-25 7:05 ` Len Brown 1 sibling, 1 reply; 40+ messages in thread From: Vaidyanathan Srinivasan @ 2011-03-24 16:52 UTC (permalink / raw) To: Trinabh Gupta Cc: Len Brown, arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel * Trinabh Gupta <trinabh@linux.vnet.ibm.com> [2011-03-24 19:43:43]: [snip] > >>But we also have to replace the functionality provided by pm_idle, > >>i.e. call default_idle for platforms where no better idle routine > >>exists, call mwait for pre-nehalem platforms, use intel_idle or > >>acpi_idle for nehalem architectures etc. To manage all this > >>we need a registration mechanism which is conveniently provided > >>by cpuidle. > > > >It isn't immediately clear to me that all of these options > >need to be preserved. > > So what do you suggest can be removed? Can we use safe_halt() until intel_idle/acpi_idle take over? But what if they do not take over? If safe_halt() is not very bad compared to the variants like mwait_idle and c1e_idle, then we can remove the old code and no need to move them to default driver. > >Are we suggesting that x86 must always build with cpuidle? > >I'm sure that somebody someplace will object to that. > > Arjan argued that since almost everyone today runs cpuidle > it may be best to include it in the kernel > (https://lkml.org/lkml/2010/10/20/243). But yes, we agreed > that we would have to make cpuidle lighter incrementally. > Making ladder governor optional could be one way for example. > > > >OTOH, if cpuidle is included, I'd like to see the > >non-cpuidle code excluded, since nobody will run it... The non-cpuidle code will be the select_idle_routine() and related function that cam move to default_driver that register to cpuidle. We can load on-demand as module if better routines fail to register. Maybe we don't need this at all as discussed in the above point? --Vaidy [snip] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-24 16:52 ` Vaidyanathan Srinivasan @ 2011-03-25 7:13 ` Len Brown 0 siblings, 0 replies; 40+ messages in thread From: Len Brown @ 2011-03-25 7:13 UTC (permalink / raw) To: Vaidyanathan Srinivasan Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, Trinabh Gupta > > So what do you suggest can be removed? > > Can we use safe_halt() until intel_idle/acpi_idle take over? But what > if they do not take over? If safe_halt() is not very bad compared to > the variants like mwait_idle and c1e_idle, then we can remove the old > code and no need to move them to default driver. One reason I'd like a default cpuidle driver is that today there is a race. cpuidle registers, but until its driver registers it will use polling. go ahead and look: grep . /sys/devices/system/cpu/cpu*/cpuidle/state0/usage that should be 0, but it isn't... > > >Are we suggesting that x86 must always build with cpuidle? > > >I'm sure that somebody someplace will object to that. > > > > Arjan argued that since almost everyone today runs cpuidle > > it may be best to include it in the kernel > > (https://lkml.org/lkml/2010/10/20/243). But yes, we agreed > > that we would have to make cpuidle lighter incrementally. > > Making ladder governor optional could be one way for example. > > > > > >OTOH, if cpuidle is included, I'd like to see the > > >non-cpuidle code excluded, since nobody will run it... > > The non-cpuidle code will be the select_idle_routine() and related > function that cam move to default_driver that register to cpuidle. > We can load on-demand as module if better routines fail to register. > Maybe we don't need this at all as discussed in the above point? Right, though I don't share your fascination with modules. cheers, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-24 14:13 ` Trinabh Gupta 2011-03-24 16:52 ` Vaidyanathan Srinivasan @ 2011-03-25 7:05 ` Len Brown 2011-03-25 15:35 ` [Xen-devel] " Konrad Rzeszutek Wilk 1 sibling, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-25 7:05 UTC (permalink / raw) To: Trinabh Gupta Cc: arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel > I think there are other problems too, related to saving and restoring > of pm_idle pointer. For example, cpuidle itself saves current value > of pm_idle, flips it and then restores the saved value. There is > no guarantee that the saved function still exists. APM does exact > same thing (though it may not be used these days). > > The problem also is that a number of architectures have copied the > same design based on pm_idle; so its spreading. pm_idle is a primitive design yes, but I think the issue with pm_idle is a theoretical one, at least on x86; as there isn't any other code scribbling on pm_idle in practice. So this is clean-up, rather than bug-fix work... > > It isn't immediately clear to me that all of these options > > need to be preserved. > > So what do you suggest can be removed? I sent a series of small patches yesterday to get the ball rolling... https://lkml.org/lkml/2011/3/24/54 I think the xen thing can go away. I proposed that APM be removed entirely, but that will take a few releases to conclude.... > > Are we suggesting that x86 must always build with cpuidle? > > I'm sure that somebody someplace will object to that. > > Arjan argued that since almost everyone today runs cpuidle > it may be best to include it in the kernel > (https://lkml.org/lkml/2010/10/20/243). But yes, we agreed > that we would have to make cpuidle lighter incrementally. > Making ladder governor optional could be one way for example. ladder is already optional. cheers, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xen-devel] Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-25 7:05 ` Len Brown @ 2011-03-25 15:35 ` Konrad Rzeszutek Wilk 2011-03-31 2:25 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-03-25 15:35 UTC (permalink / raw) To: Len Brown Cc: Trinabh Gupta, venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan On Fri, Mar 25, 2011 at 03:05:36AM -0400, Len Brown wrote: > > I think there are other problems too, related to saving and restoring > > of pm_idle pointer. For example, cpuidle itself saves current value > > of pm_idle, flips it and then restores the saved value. There is > > no guarantee that the saved function still exists. APM does exact > > same thing (though it may not be used these days). > > > > The problem also is that a number of architectures have copied the > > same design based on pm_idle; so its spreading. > > pm_idle is a primitive design yes, but I think the issue > with pm_idle is a theoretical one, at least on x86; > as there isn't any other code scribbling on pm_idle > in practice. So this is clean-up, rather than bug-fix work... > > > > It isn't immediately clear to me that all of these options > > > need to be preserved. > > > > So what do you suggest can be removed? > > I sent a series of small patches yesterday to get the ball rolling... > https://lkml.org/lkml/2011/3/24/54 > > I think the xen thing can go away. The xen thing being the setting of cpuidle to halt or the proposed patch? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection 2011-03-25 15:35 ` [Xen-devel] " Konrad Rzeszutek Wilk @ 2011-03-31 2:25 ` Len Brown 0 siblings, 0 replies; 40+ messages in thread From: Len Brown @ 2011-03-31 2:25 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: venki, ak, suresh.b.siddha, sfr, peterz, benh, linux-kernel, xen-devel, arjan, Trinabh Gupta thanks, Len Brown, Intel Open Source Technology Center On Fri, 25 Mar 2011, Konrad Rzeszutek Wilk wrote: > On Fri, Mar 25, 2011 at 03:05:36AM -0400, Len Brown wrote: > > > I think there are other problems too, related to saving and restoring > > > of pm_idle pointer. For example, cpuidle itself saves current value > > > of pm_idle, flips it and then restores the saved value. There is > > > no guarantee that the saved function still exists. APM does exact > > > same thing (though it may not be used these days). > > > > > > The problem also is that a number of architectures have copied the > > > same design based on pm_idle; so its spreading. > > > > pm_idle is a primitive design yes, but I think the issue > > with pm_idle is a theoretical one, at least on x86; > > as there isn't any other code scribbling on pm_idle > > in practice. So this is clean-up, rather than bug-fix work... > > > > > > It isn't immediately clear to me that all of these options > > > > need to be preserved. > > > > > > So what do you suggest can be removed? > > > > I sent a series of small patches yesterday to get the ball rolling... > > https://lkml.org/lkml/2011/3/24/54 > > > > I think the xen thing can go away. > > The xen thing being the setting of cpuidle to halt or the proposed > patch? I don't think Xen needs a cpuidle driver. Xen just needs to tell the kernel to call HALT, and I think we'll keep that around for the non-cpuidle case, and for the idle periods before cpuidle initializes. cheers, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20110322123244.28725.32435.stgit@tringupt.in.ibm.com>]
[parent not found: <alpine.LFD.2.02.1103222304300.10549@x980>]
* Re: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86 [not found] ` <alpine.LFD.2.02.1103222304300.10549@x980> @ 2011-03-23 9:31 ` Trinabh Gupta 2011-03-24 16:32 ` Vaidyanathan Srinivasan 0 siblings, 1 reply; 40+ messages in thread From: Trinabh Gupta @ 2011-03-23 9:31 UTC (permalink / raw) To: Len Brown Cc: arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel On 03/23/2011 08:43 AM, Len Brown wrote: > Why is this patch a step forward? Hi Len, I have basically moved the code for arch default and mwait idle from arch/x86/kernel/process.c to a driver. This was suggested by Venki (https://lkml.org/lkml/2010/10/19/460) as part of pm_idle cleanup and direct call of cpuidle_idle_call(). There is not much new code here. > >> +obj-$(CONFIG_X86) += default_driver.o > > BTW, that's a pretty generic name for an x86 specific idle driver... > > I think that on builds that support intel_idle and acpi_idle, > everything in this file will be unused, unless somebody uses some > debugging cmdline params that should have been deleted ages ago. Yes, I agree that the name has to be x86 specific. I think the routines would be used for pre-nehalem architectures that use arch default or mwait. Thanks, -Trinabh > > thanks, > Len Brown, Intel Open Source Technology Center > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86 2011-03-23 9:31 ` [RFC PATCH V4 3/5] cpuidle: default idle driver for x86 Trinabh Gupta @ 2011-03-24 16:32 ` Vaidyanathan Srinivasan 0 siblings, 0 replies; 40+ messages in thread From: Vaidyanathan Srinivasan @ 2011-03-24 16:32 UTC (permalink / raw) To: Trinabh Gupta Cc: Len Brown, arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, sfr, xen-devel * Trinabh Gupta <trinabh@linux.vnet.ibm.com> [2011-03-23 15:01:14]: > > On 03/23/2011 08:43 AM, Len Brown wrote: > >Why is this patch a step forward? > > Hi Len, > > I have basically moved the code for arch default and mwait > idle from arch/x86/kernel/process.c to a driver. This was > suggested by Venki (https://lkml.org/lkml/2010/10/19/460) > as part of pm_idle cleanup and direct call of > cpuidle_idle_call(). There is not much new code here. > > > > >>+obj-$(CONFIG_X86) += default_driver.o > > > >BTW, that's a pretty generic name for an x86 specific idle driver... > > > >I think that on builds that support intel_idle and acpi_idle, > >everything in this file will be unused, unless somebody uses some > >debugging cmdline params that should have been deleted ages ago. > > Yes, I agree that the name has to be x86 specific. I think the > routines would be used for pre-nehalem architectures that use > arch default or mwait. Mainly selection between default_idle (safe_halt), mwait_idle and c1e_idle needs to be placed in a default driver. This is the code that was 'outside' of cpuidle framework and directly used pm_idle(). This is mostly unused and overridden by intel_idle or acpi_idle, but still cannot be discarded. Maybe keep this as a module and probe/load only if both intel_idle and acpi_idle failed to load or excluded by command line or otherwise. --Vaidy ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20110322123223.28725.21929.stgit@tringupt.in.ibm.com>]
[parent not found: <20110323120044.bb8c0ae1.sfr@canb.auug.org.au>]
* Re: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86 [not found] ` <20110323120044.bb8c0ae1.sfr@canb.auug.org.au> @ 2011-03-23 10:10 ` Trinabh Gupta 0 siblings, 0 replies; 40+ messages in thread From: Trinabh Gupta @ 2011-03-23 10:10 UTC (permalink / raw) To: Stephen Rothwell Cc: arjan, peterz, lenb, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel Hi Stephen, On 03/23/2011 06:30 AM, Stephen Rothwell wrote: > Hi, > > Just some simple comments. > > Does having this patch first in the series break APM idle? Thanks for reviewing. Yes, I think removal of "pm_idle = default_idle" statement in APM may have to be moved here. > > On Tue, 22 Mar 2011 18:02:27 +0530 Trinabh Gupta<trinabh@linux.vnet.ibm.com> wrote: >> >> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c >> index 8d12878..17b7101 100644 >> --- a/arch/x86/kernel/process_32.c >> +++ b/arch/x86/kernel/process_32.c >> @@ -74,6 +74,8 @@ static inline void play_dead(void) >> } >> #endif >> >> +extern void cpuidle_idle_call(void); > > Put this declaration in a header file and include that header file here > and in the file that defines that function. > Ok, thanks -Trinabh ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20110322123336.28725.29810.stgit@tringupt.in.ibm.com>]
[parent not found: <20110323121458.ec7cdaf9.sfr@canb.auug.org.au>]
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm [not found] ` <20110323121458.ec7cdaf9.sfr@canb.auug.org.au> @ 2011-03-23 10:25 ` Trinabh Gupta 2011-03-23 20:32 ` Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Trinabh Gupta @ 2011-03-23 10:25 UTC (permalink / raw) To: Stephen Rothwell Cc: arjan, peterz, lenb, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel Hi Stephen, Thanks for reviewing. On 03/23/2011 06:44 AM, Stephen Rothwell wrote: > Hi, > > On Tue, 22 Mar 2011 18:03:40 +0530 Trinabh Gupta<trinabh@linux.vnet.ibm.com> wrote: >> >> +static int apm_setup_cpuidle(int cpu) >> +{ >> + struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device), >> + GFP_KERNEL); > > Same as xen comments: no NULL check. > >> + int count = CPUIDLE_DRIVER_STATE_START; >> + dev->cpu = cpu; >> + dev->drv =&apm_idle_driver; > > Also wondering why you would ever have a different idle routine on > different cpus? Yes, this is an ongoing debate. Apparently it is a possibility because of ACPI bugs. CPU's can have asymmetric C-states and overall different idle routines on different cpus. Please refer to http://lkml.org/lkml/2009/9/24/132 and https://lkml.org/lkml/2011/2/10/37 for a discussion around this. I have posted a patch series that does global registration i.e same idle routines for each cpu. Please check http://lkml.org/lkml/2011/3/22/161 . That series applies on top of this series. Global registration significantly simplifies the design, but still we are not sure about the direction to take. > >> + >> + dev->states[count] = state_apm_idle; >> + count++; >> + >> + dev->state_count = count; >> + >> + if (cpuidle_register_device(dev)) >> + return -EIO; > > And we leak dev. > >> +static void apm_idle_exit(void) >> +{ >> + cpuidle_unregister_driver(&apm_idle_driver); > > Do we leak the per cpu device structure here? I will see how we can save per cpu device structure pointers so that we can free them. > >> + return; > > Unnecessary return statement. > Thanks, -Trinabh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm 2011-03-23 10:25 ` [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm Trinabh Gupta @ 2011-03-23 20:32 ` Len Brown 2011-03-24 14:28 ` Trinabh Gupta 0 siblings, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-23 20:32 UTC (permalink / raw) To: Trinabh Gupta Cc: Stephen Rothwell, arjan, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel > > Also wondering why you would ever have a different idle routine on > > different cpus? > > Yes, this is an ongoing debate. Apparently it is a possibility > because of ACPI bugs. CPU's can have asymmetric C-states > and overall different idle routines on different cpus. Please > refer to http://lkml.org/lkml/2009/9/24/132 and > https://lkml.org/lkml/2011/2/10/37 for a discussion around this. Althought the ACPI specification allows the BIOS to tell the OS about different C-states per-processor, I know of zero system in the field and zero systems in development that require that capability. That isn't a guarantee that capability will never be used, but I'm not holding my breath. If there are systems with broken tables that make them appear asymetric, then we should have a workaround that handles that case, rather than complicating the normal code for the broken case. So I recommend deleting the extra per-cpu registration stuff unless there is some other architecture that requires it and can't hadle the asymmetry in another way. > I have posted a patch series that does global registration > i.e same idle routines for each cpu. Please check > http://lkml.org/lkml/2011/3/22/161 . That series applies on > top of this series. Global registration significantly > simplifies the design, but still we are not sure about the > direction to take. I'll review that. thanks, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm 2011-03-23 20:32 ` Len Brown @ 2011-03-24 14:28 ` Trinabh Gupta 2011-03-24 16:21 ` Vaidyanathan Srinivasan 2011-03-25 7:24 ` Len Brown 0 siblings, 2 replies; 40+ messages in thread From: Trinabh Gupta @ 2011-03-24 14:28 UTC (permalink / raw) To: Len Brown, arjan Cc: Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On 03/24/2011 02:02 AM, Len Brown wrote: >>> Also wondering why you would ever have a different idle routine on >>> different cpus? >> >> Yes, this is an ongoing debate. Apparently it is a possibility >> because of ACPI bugs. CPU's can have asymmetric C-states >> and overall different idle routines on different cpus. Please >> refer to http://lkml.org/lkml/2009/9/24/132 and >> https://lkml.org/lkml/2011/2/10/37 for a discussion around this. > > Althought the ACPI specification allows the BIOS to tell the OS > about different C-states per-processor, I know of zero system > in the field and zero systems in development that require that > capability. That isn't a guarantee that capability will never > be used, but I'm not holding my breath. > > If there are systems with broken tables that make them > appear asymetric, then we should have a workaround that handles > that case, rather than complicating the normal code for > the broken case. > > So I recommend deleting the extra per-cpu registration stuff > unless there is some other architecture that requires it > and can't hadle the asymmetry in another way. Yes, lets go forward with removal of per-cpu registration and handle rare case of asymmetry in some other may. Using intersection or union of C-states for each cpu may be a solution. Using intersection or lowest common C-state has the corner case that we could have packages/cores supporting a new lower C-state in case of thermal limit and they would want OS to go to this state. Using intersection or lowest common C-state may prevent this. Another option is to use union of C-states; but I am not sure what happens if a CPU uses a state that is not reported for it??? Maybe there is some other way to handle asymmetry ?? > >> I have posted a patch series that does global registration >> i.e same idle routines for each cpu. Please check >> http://lkml.org/lkml/2011/3/22/161 . That series applies on >> top of this series. Global registration significantly >> simplifies the design, but still we are not sure about the >> direction to take. > > I'll review that. Thanks; please review especially the data structure changes https://lkml.org/lkml/2011/3/22/162 -Trinabh ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm 2011-03-24 14:28 ` Trinabh Gupta @ 2011-03-24 16:21 ` Vaidyanathan Srinivasan 2011-03-25 7:24 ` Len Brown 1 sibling, 0 replies; 40+ messages in thread From: Vaidyanathan Srinivasan @ 2011-03-24 16:21 UTC (permalink / raw) To: Trinabh Gupta Cc: Len Brown, arjan, Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel * Trinabh Gupta <trinabh@linux.vnet.ibm.com> [2011-03-24 19:58:29]: > > > On 03/24/2011 02:02 AM, Len Brown wrote: > >>>Also wondering why you would ever have a different idle routine on > >>>different cpus? > >> > >>Yes, this is an ongoing debate. Apparently it is a possibility > >>because of ACPI bugs. CPU's can have asymmetric C-states > >>and overall different idle routines on different cpus. Please > >>refer to http://lkml.org/lkml/2009/9/24/132 and > >>https://lkml.org/lkml/2011/2/10/37 for a discussion around this. > > > >Althought the ACPI specification allows the BIOS to tell the OS > >about different C-states per-processor, I know of zero system > >in the field and zero systems in development that require that > >capability. That isn't a guarantee that capability will never > >be used, but I'm not holding my breath. > > > >If there are systems with broken tables that make them > >appear asymetric, then we should have a workaround that handles > >that case, rather than complicating the normal code for > >the broken case. > > > >So I recommend deleting the extra per-cpu registration stuff > >unless there is some other architecture that requires it > >and can't hadle the asymmetry in another way. Hi Len, The fear of breaking legacy hardware is what is holding us. Arjan also pointed that there are systems that has asymmetric C-States either intentionally or due to a bug. I agree with you that we should remove per-cpu registration at this point and move forward with a single registration. We can workaround the corner cases. > Yes, lets go forward with removal of per-cpu registration > and handle rare case of asymmetry in some other may. Yes. Lets discuss the design/problems on the other patch series. (http://lkml.org/lkml/2011/3/22/161) > Using intersection or union of C-states for each cpu may > be a solution. Using intersection or lowest common C-state > has the corner case that we could have packages/cores > supporting a new lower C-state in case of thermal limit and > they would want OS to go to this state. Using intersection > or lowest common C-state may prevent this. > > Another option is to use union of C-states; > but I am not sure what happens if a CPU uses a state that > is not reported for it??? > > Maybe there is some other way to handle asymmetry ?? The simplest method will be a union of all C-States. Basically let the CPU that reports the maximum C-States to register or override the current setup. During boot-up allow the first CPU to do the registration, but later override if another CPU comes up with more C-States. This will work assuming that calling a new (deeper) C-State on CPUs that did not report them will cause no harm. It should degenerate to closest supported C-State. > > > >>I have posted a patch series that does global registration > >>i.e same idle routines for each cpu. Please check > >>http://lkml.org/lkml/2011/3/22/161 . That series applies on > >>top of this series. Global registration significantly > >>simplifies the design, but still we are not sure about the > >>direction to take. > > > >I'll review that. > Thanks; please review especially the data structure changes > https://lkml.org/lkml/2011/3/22/162 Single registration will allow us to combine struct cpuidle_device and struct cpuidle_driver, and simplify the framework for large systems. AFAIK other archs like powerpc or ARM would not need per-cpu definitions. --Vaidy ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm 2011-03-24 14:28 ` Trinabh Gupta 2011-03-24 16:21 ` Vaidyanathan Srinivasan @ 2011-03-25 7:24 ` Len Brown 2011-03-25 18:01 ` Vaidyanathan Srinivasan 1 sibling, 1 reply; 40+ messages in thread From: Len Brown @ 2011-03-25 7:24 UTC (permalink / raw) To: Trinabh Gupta Cc: arjan, Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel > Maybe there is some other way to handle asymmetry ?? When somebody invents a system that needs asymmetry on purpose, that is the time to get fancy. Until that day, simply print a FW_BUG WARNING and fall back to default_idle(). cheers, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm 2011-03-25 7:24 ` Len Brown @ 2011-03-25 18:01 ` Vaidyanathan Srinivasan 2011-03-31 2:17 ` cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) Len Brown 0 siblings, 1 reply; 40+ messages in thread From: Vaidyanathan Srinivasan @ 2011-03-25 18:01 UTC (permalink / raw) To: Len Brown Cc: Trinabh Gupta, arjan, Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel * Len Brown <lenb@kernel.org> [2011-03-25 03:24:24]: > > > Maybe there is some other way to handle asymmetry ?? > > When somebody invents a system that needs asymmetry on purpose, > that is the time to get fancy. Until that day, simply print a > FW_BUG WARNING and fall back to default_idle(). We may print a warning but not switch to default_idle(). There could be laptops that gets an extra C-State when run on battery. So when the ACPI _PPC event happens for the C-State change, we need to handle the situation such that the states are refreshed for all CPUs in one step. Basically while we are updating the new C-State, the framework should not detect this as asymmetric and switch to default_idle(). Please fell free to suggest a more elegant solution to handle the runtime C-State change event, though symmetric across all CPUs. --Vaidy ^ permalink raw reply [flat|nested] 40+ messages in thread
* cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-03-25 18:01 ` Vaidyanathan Srinivasan @ 2011-03-31 2:17 ` Len Brown 2011-03-31 13:18 ` Peter Zijlstra 2011-04-01 7:02 ` Trinabh Gupta 0 siblings, 2 replies; 40+ messages in thread From: Len Brown @ 2011-03-31 2:17 UTC (permalink / raw) To: Vaidyanathan Srinivasan Cc: Trinabh Gupta, arjan, Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel > > > Maybe there is some other way to handle asymmetry ?? I mis-spoke on asymmetry. Moorestown is already an example of an asymmetric system, since its deepest c-state is available on cpu0, but not on cpu1. So it needs different tables for each cpu. I think what would work is a default c-state table for the system, and the ability of a per-cpu override table. I think that would gracefully handle the case of many identical cpus, and also systems with different tables per cpu. The same goes for write-access to the tables. In the typical case, a single table can be shared for the entire system and nobody will be writing to it. However, with the governor changes to call dev->prepare and sift through all the states to find the legal one with the lowest power_usage... There is software today out of tree that updates that power_usage entry from prepare(). As I mentioned, I'm not fond of that mechanism - it looks racey to me. I'd rather see the capability of a drivers idle handler to demote to another handler in the driver and for the accounting to not get messed up when that happens. I think the way to do that is to let the driver do the accounting rather than doing it in the cpuidle caller. cheers, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-03-31 2:17 ` cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) Len Brown @ 2011-03-31 13:18 ` Peter Zijlstra 2011-04-01 4:09 ` Len Brown 2011-04-01 7:02 ` Trinabh Gupta 1 sibling, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2011-03-31 13:18 UTC (permalink / raw) To: Len Brown Cc: Stephen Rothwell, ak, suresh.b.siddha, venki, benh, linux-kernel, xen-devel, Vaidyanathan Srinivasan, arjan, Trinabh Gupta On Wed, 2011-03-30 at 22:17 -0400, Len Brown wrote: > > Moorestown is already an example of an asymmetric system, > since its deepest c-state is available on cpu0, but not on cpu1. > So it needs different tables for each cpu. wtf are these hardware guys smoking and how the heck are we supposed to schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-03-31 13:18 ` Peter Zijlstra @ 2011-04-01 4:09 ` Len Brown 2011-04-01 8:15 ` Dipankar Sarma 2011-04-01 14:02 ` Peter Zijlstra 0 siblings, 2 replies; 40+ messages in thread From: Len Brown @ 2011-04-01 4:09 UTC (permalink / raw) To: Peter Zijlstra Cc: Vaidyanathan Srinivasan, Trinabh Gupta, arjan, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel > > Moorestown is already an example of an asymmetric system, > > since its deepest c-state is available on cpu0, but not on cpu1. > > So it needs different tables for each cpu. > > wtf are these hardware guys smoking and how the heck are we supposed to > schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? they are smoking micro-amps:-) S0i3 on cpu0 can be entered only after cpu1 is already off-line, among other system hardware dependencies... So it makes no sense to export S0i3 as a c-state on cpu1. When cpu1 is online, the scheduler treats it as a normal SMP. cheers, -Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-01 4:09 ` Len Brown @ 2011-04-01 8:15 ` Dipankar Sarma 2011-04-01 14:38 ` Arjan van de Ven 2011-04-01 14:02 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: Dipankar Sarma @ 2011-04-01 8:15 UTC (permalink / raw) To: Len Brown Cc: Peter Zijlstra, Vaidyanathan Srinivasan, Trinabh Gupta, arjan, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On Fri, Apr 01, 2011 at 12:09:25AM -0400, Len Brown wrote: > > > Moorestown is already an example of an asymmetric system, > > > since its deepest c-state is available on cpu0, but not on cpu1. > > > So it needs different tables for each cpu. > > > > wtf are these hardware guys smoking and how the heck are we supposed to > > schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? > > they are smoking micro-amps:-) > > S0i3 on cpu0 can be entered only after cpu1 is already off-line, > among other system hardware dependencies... > > So it makes no sense to export S0i3 as a c-state on cpu1. > > When cpu1 is online, the scheduler treats it as a normal SMP. Isn't S0i3 a "system" state, as opposed to cpu state ? Perhaps we can treat it as such and still keep the c-states symmetric. The cpu can transition to S0i3 from any cpu as long as others are offlined, no ? In that sense, really all the cpus would be in S0i3, which would make it symmetric. If this isn't how mrst cpuidle works, then cpuidle accounting may be broken in principle because of this. Thanks Dipankar ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-01 8:15 ` Dipankar Sarma @ 2011-04-01 14:38 ` Arjan van de Ven 2011-04-03 16:18 ` Dipankar Sarma 0 siblings, 1 reply; 40+ messages in thread From: Arjan van de Ven @ 2011-04-01 14:38 UTC (permalink / raw) To: dipankar Cc: Len Brown, Peter Zijlstra, Vaidyanathan Srinivasan, Trinabh Gupta, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On 4/1/2011 1:15 AM, Dipankar Sarma wrote: > On Fri, Apr 01, 2011 at 12:09:25AM -0400, Len Brown wrote: >>>> Moorestown is already an example of an asymmetric system, >>>> since its deepest c-state is available on cpu0, but not on cpu1. >>>> So it needs different tables for each cpu. >>> wtf are these hardware guys smoking and how the heck are we supposed to >>> schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? >> they are smoking micro-amps:-) >> >> S0i3 on cpu0 can be entered only after cpu1 is already off-line, >> among other system hardware dependencies... >> >> So it makes no sense to export S0i3 as a c-state on cpu1. >> >> When cpu1 is online, the scheduler treats it as a normal SMP. > Isn't S0i3 a "system" state, as opposed to cpu state ? it's misnamed. it's a C state to the OS. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-01 14:38 ` Arjan van de Ven @ 2011-04-03 16:18 ` Dipankar Sarma 0 siblings, 0 replies; 40+ messages in thread From: Dipankar Sarma @ 2011-04-03 16:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Len Brown, Peter Zijlstra, Vaidyanathan Srinivasan, Trinabh Gupta, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On Fri, Apr 01, 2011 at 07:38:23AM -0700, Arjan van de Ven wrote: > On 4/1/2011 1:15 AM, Dipankar Sarma wrote: > >On Fri, Apr 01, 2011 at 12:09:25AM -0400, Len Brown wrote: > >>>>Moorestown is already an example of an asymmetric system, > >>>>since its deepest c-state is available on cpu0, but not on cpu1. > >>>>So it needs different tables for each cpu. > >>>wtf are these hardware guys smoking and how the heck are we supposed to > >>>schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? > >>they are smoking micro-amps:-) > >> > >>S0i3 on cpu0 can be entered only after cpu1 is already off-line, > >>among other system hardware dependencies... > >> > >>So it makes no sense to export S0i3 as a c-state on cpu1. > >> > >>When cpu1 is online, the scheduler treats it as a normal SMP. > >Isn't S0i3 a "system" state, as opposed to cpu state ? > > it's misnamed. it's a C state to the OS. I understand that it has been implemented as a C-state from this - http://build.meego.com/package/view_file?file=linux-2.6.37-mrst-s0i3.patch&package=kernel-adaptation-mrst&project=home%3Adliu9&srcmd5=a0929a2863150f5c8454507d6cd8f09d The key question is this - +int mrst_check_state_availability(struct cpuidle_device *dev) +{ + int cpu = smp_processor_id(); + + /* + * If there is another CPU running, the GPU is active, + * the PMU is uninitialized, or there is a still-unprocessed + * PMU command, we cannot enter S0i3. + */ + if (!pmu_reg || !cpumask_equal(cpu_online_mask, cpumask_of(cpu)) || + s0i3_pmu_command_pending) + dev->states[5].flags |= CPUIDLE_FLAG_IGNORE; + else + dev->states[5].flags &= ~CPUIDLE_FLAG_IGNORE; Is this really asymetric ? Or is it that if the other cpu(s) are in C6, the chip can enter S0i3 ? If that is the case, then isn't this equivalent to all the cpus in the chip entering S0i3 and thus symmetrical ? Also, if going to S0i3 relies on other cpus offlined, then we don't need to worry about asymetry from the scheduler/cpuidle point of view, no ? Thanks Dipankar ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-01 4:09 ` Len Brown 2011-04-01 8:15 ` Dipankar Sarma @ 2011-04-01 14:02 ` Peter Zijlstra 2011-04-04 14:32 ` Dipankar Sarma 1 sibling, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2011-04-01 14:02 UTC (permalink / raw) To: Len Brown Cc: Vaidyanathan Srinivasan, Trinabh Gupta, arjan, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On Fri, 2011-04-01 at 00:09 -0400, Len Brown wrote: > > > Moorestown is already an example of an asymmetric system, > > > since its deepest c-state is available on cpu0, but not on cpu1. > > > So it needs different tables for each cpu. > > > > wtf are these hardware guys smoking and how the heck are we supposed to > > schedule on such a machine? Prefer to keep cpu1 busy while idling cpu0? > > they are smoking micro-amps:-) Has anybody told them that pushing lots of logic into software generally burns more amps because it keeps the thing running longer? > S0i3 on cpu0 can be entered only after cpu1 is already off-line, > among other system hardware dependencies... > > So it makes no sense to export S0i3 as a c-state on cpu1. > > When cpu1 is online, the scheduler treats it as a normal SMP. Dipankar's reply seems to address this issue well. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-01 14:02 ` Peter Zijlstra @ 2011-04-04 14:32 ` Dipankar Sarma 2011-04-05 15:01 ` Peter Zijlstra 0 siblings, 1 reply; 40+ messages in thread From: Dipankar Sarma @ 2011-04-04 14:32 UTC (permalink / raw) To: Peter Zijlstra Cc: Len Brown, Vaidyanathan Srinivasan, Trinabh Gupta, arjan, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On Fri, Apr 01, 2011 at 04:02:36PM +0200, Peter Zijlstra wrote: > > > S0i3 on cpu0 can be entered only after cpu1 is already off-line, > > among other system hardware dependencies... > > > > So it makes no sense to export S0i3 as a c-state on cpu1. > > > > When cpu1 is online, the scheduler treats it as a normal SMP. > > Dipankar's reply seems to address this issue well. I can't find any Moorestown documentation at the Intel site, but thinking about Len's inputs a bit more, it seems there may be still a problem asymetry from the scheduler perspective. If cpu0 or cpu1 either of them can be offlined, there is no asymetry. If only cpu1 can be offlined, it would mean that one cpu may be more efficient depending on how we do cpu offlining for power savings. It gets a bit messy. Len, what exacty is the significance of offlining here ? Apart from going to C6, what else is needed in cpu1 for the chip to go to S0i3 ? Why is idle C6 not enough ? Thanks Dipankar ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-04 14:32 ` Dipankar Sarma @ 2011-04-05 15:01 ` Peter Zijlstra 2011-04-05 15:48 ` Dipankar Sarma 0 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2011-04-05 15:01 UTC (permalink / raw) To: dipankar Cc: Stephen Rothwell, ak, suresh.b.siddha, venki, benh, linux-kernel, xen-devel, Vaidyanathan Srinivasan, arjan, Trinabh Gupta, Len Brown On Mon, 2011-04-04 at 20:02 +0530, Dipankar Sarma wrote: > On Fri, Apr 01, 2011 at 04:02:36PM +0200, Peter Zijlstra wrote: > > > > > S0i3 on cpu0 can be entered only after cpu1 is already off-line, > > > among other system hardware dependencies... > > > > > > So it makes no sense to export S0i3 as a c-state on cpu1. > > > > > > When cpu1 is online, the scheduler treats it as a normal SMP. > > > > Dipankar's reply seems to address this issue well. > > I can't find any Moorestown documentation at the Intel site, but > thinking about Len's inputs a bit more, it seems there may > be still a problem asymetry from the scheduler perspective. > > If cpu0 or cpu1 either of them can be offlined, there is no > asymetry. If only cpu1 can be offlined, it would mean that > one cpu may be more efficient depending on how we do > cpu offlining for power savings. It gets a bit messy. > > Len, what exacty is the significance of offlining here ? > Apart from going to C6, what else is needed in cpu1 for > the chip to go to S0i3 ? Why is idle C6 not enough ? I don't think offlining is relevant, anybody using that for power management is doing it wrong, _very_ wrong. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-04-05 15:01 ` Peter Zijlstra @ 2011-04-05 15:48 ` Dipankar Sarma 0 siblings, 0 replies; 40+ messages in thread From: Dipankar Sarma @ 2011-04-05 15:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Len Brown, Vaidyanathan Srinivasan, Trinabh Gupta, arjan, Stephen Rothwell, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On Tue, Apr 05, 2011 at 05:01:32PM +0200, Peter Zijlstra wrote: > On Mon, 2011-04-04 at 20:02 +0530, Dipankar Sarma wrote: > > On Fri, Apr 01, 2011 at 04:02:36PM +0200, Peter Zijlstra wrote: > > I can't find any Moorestown documentation at the Intel site, but > > thinking about Len's inputs a bit more, it seems there may > > be still a problem asymetry from the scheduler perspective. > > > > If cpu0 or cpu1 either of them can be offlined, there is no > > asymetry. If only cpu1 can be offlined, it would mean that > > one cpu may be more efficient depending on how we do > > cpu offlining for power savings. It gets a bit messy. > > > > Len, what exacty is the significance of offlining here ? > > Apart from going to C6, what else is needed in cpu1 for > > the chip to go to S0i3 ? Why is idle C6 not enough ? > > I don't think offlining is relevant, anybody using that for power > management is doing it wrong, _very_ wrong. I am suggesting that it depends on the offlining logic. If cpu1 is being used as an added co-processor for some specific apps and mostly offline otherwise, it may not be an issue. If offlining is being used as a meta-scheduler over the kernel scheduler (like power savings or whatever logic) than it will cause asymmetry problems dependent on the ooffline logic - e.g. it may be more advantageous for the kernel scheduler to schedule on cpu0 keeping cpu1 free leading to S0i3 more often. Not advocating it when we are trying to run away from it on powerpc :) For now, it seems we are OK in handling S0i3 through cpuidle, just that it would be nice to understand the overall offline logic and why it is needed. Similar questions have come up in my discussions with ARM guys in the recent times as well. Thanks Dipankar ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) 2011-03-31 2:17 ` cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) Len Brown 2011-03-31 13:18 ` Peter Zijlstra @ 2011-04-01 7:02 ` Trinabh Gupta 1 sibling, 0 replies; 40+ messages in thread From: Trinabh Gupta @ 2011-04-01 7:02 UTC (permalink / raw) To: Len Brown Cc: Vaidyanathan Srinivasan, arjan, Stephen Rothwell, peterz, suresh.b.siddha, benh, venki, ak, linux-kernel, xen-devel On 03/31/2011 07:47 AM, Len Brown wrote: >>>> Maybe there is some other way to handle asymmetry ?? > > I mis-spoke on asymmetry. > > Moorestown is already an example of an asymmetric system, > since its deepest c-state is available on cpu0, but not on cpu1. > So it needs different tables for each cpu. > > I think what would work is a default c-state table for the system, > and the ability of a per-cpu override table. I think that would > gracefully handle the case of many identical cpus, and also systems > with different tables per cpu. Hi Len, What would happen if a cpu enters a state which wasn't reported for it? I am wondering if an approach like union of states of each would work. Can we handle asymmetry through checking and demotion inside the routine itself; just like you are proposing as dev->prepare alternative? But I guess this may not be efficient if this happens often. I am not sure if having a per-cpu override would be very tidy (ideas ?); and much better than per-cpu stuff. So just want to check what would be the best way forward? > > The same goes for write-access to the tables. > In the typical case, a single table can be shared for the entire system > and nobody will be writing to it. However, with the governor changes > to call dev->prepare and sift through all the states to find the > legal one with the lowest power_usage... There is software today > out of tree that updates that power_usage entry from prepare(). > > As I mentioned, I'm not fond of that mechanism - it looks racey > to me. I'd rather see the capability of a drivers idle handler > to demote to another handler in the driver and for the accounting > to not get messed up when that happens. I think the way to do that > is to let the driver do the accounting rather than doing it in > the cpuidle caller. I agree with this; we should move update of statistics inside the driver routines (enter routines). They already take device/stats structure as parameter. Thanks, -Trinabh ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2011-04-05 15:48 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110322123208.28725.30945.stgit@tringupt.in.ibm.com>
[not found] ` <20110322123324.28725.3131.stgit@tringupt.in.ibm.com>
2011-03-22 14:50 ` [RFC PATCH V4 4/5] cpuidle: driver for xen Konrad Rzeszutek Wilk
2011-03-23 9:57 ` Trinabh Gupta
2011-03-24 7:18 ` Len Brown
2011-03-24 12:05 ` Konrad Rzeszutek Wilk
2011-03-25 7:19 ` Len Brown
2011-03-25 14:43 ` Jeremy Fitzhardinge
2011-03-25 14:38 ` [Xen-devel] " Jeremy Fitzhardinge
2011-03-31 2:02 ` Len Brown
2011-03-31 21:26 ` Len Brown
2011-03-31 22:36 ` [Xen-devel] " Jeremy Fitzhardinge
2011-04-01 3:03 ` Len Brown
[not found] ` <20110322123233.28725.92874.stgit@tringupt.in.ibm.com>
[not found] ` <alpine.LFD.2.02.1103222254420.10549@x980>
2011-03-23 9:22 ` [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection Trinabh Gupta
2011-03-23 20:51 ` Len Brown
2011-03-24 4:41 ` Len Brown
2011-03-24 14:13 ` Trinabh Gupta
2011-03-24 16:52 ` Vaidyanathan Srinivasan
2011-03-25 7:13 ` Len Brown
2011-03-25 7:05 ` Len Brown
2011-03-25 15:35 ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-03-31 2:25 ` Len Brown
[not found] ` <20110322123244.28725.32435.stgit@tringupt.in.ibm.com>
[not found] ` <alpine.LFD.2.02.1103222304300.10549@x980>
2011-03-23 9:31 ` [RFC PATCH V4 3/5] cpuidle: default idle driver for x86 Trinabh Gupta
2011-03-24 16:32 ` Vaidyanathan Srinivasan
[not found] ` <20110322123223.28725.21929.stgit@tringupt.in.ibm.com>
[not found] ` <20110323120044.bb8c0ae1.sfr@canb.auug.org.au>
2011-03-23 10:10 ` [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer " Trinabh Gupta
[not found] ` <20110322123336.28725.29810.stgit@tringupt.in.ibm.com>
[not found] ` <20110323121458.ec7cdaf9.sfr@canb.auug.org.au>
2011-03-23 10:25 ` [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm Trinabh Gupta
2011-03-23 20:32 ` Len Brown
2011-03-24 14:28 ` Trinabh Gupta
2011-03-24 16:21 ` Vaidyanathan Srinivasan
2011-03-25 7:24 ` Len Brown
2011-03-25 18:01 ` Vaidyanathan Srinivasan
2011-03-31 2:17 ` cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm) Len Brown
2011-03-31 13:18 ` Peter Zijlstra
2011-04-01 4:09 ` Len Brown
2011-04-01 8:15 ` Dipankar Sarma
2011-04-01 14:38 ` Arjan van de Ven
2011-04-03 16:18 ` Dipankar Sarma
2011-04-01 14:02 ` Peter Zijlstra
2011-04-04 14:32 ` Dipankar Sarma
2011-04-05 15:01 ` Peter Zijlstra
2011-04-05 15:48 ` Dipankar Sarma
2011-04-01 7:02 ` Trinabh Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).