* [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT
@ 2014-08-14 2:11 Chuansheng Liu
[not found] ` <CAKnoXLw3DrBAxCUWEkXtvCTf+E1w0xTHJSiSUY6Qd6xHXeGaoQ@mail.gmail.com>
0 siblings, 1 reply; 18+ messages in thread
From: Chuansheng Liu @ 2014-08-14 2:11 UTC (permalink / raw)
To: rjw, daniel.lezcano
Cc: linux-pm, linux-kernel, changcheng.liu, xiaoming.wang,
souvik.k.chakravarty, chuansheng.liu
We found sometimes even after we let PM_QOS back to DEFAULT,
the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state
selection immediately after received the IPI interrupt.
The code model is simply like below:
{
pm_qos_update_request(&pm_qos, C1 - 1);
< == Here keep all cores at C0
...;
pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE);
< == Here some cores still stuck at C0 for 2-3s
}
The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to
wake up the core, but when core is in poll idle state, the IPI interrupt
can not break the polling loop.
So here in the IPI callback interrupt, when currently the idle task is
running, we need to forcedly set reschedule bit to break the polling loop,
as for other non-polling idle state, IPI interrupt can break them directly,
and setting reschedule bit has no harm for them too.
With this fix, we saved about 30mV power in our android platform.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
---
drivers/cpuidle/cpuidle.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ee9df5e..9e28a13 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register);
static void smp_callback(void *v)
{
- /* we already woke the CPU up, nothing more to do */
+ /* we already woke the CPU up, and when the corresponding
+ * CPU is at polling idle state, we need to set the sched
+ * bit to trigger reselect the new suitable C-state, it
+ * will be helpful for power.
+ */
+ if (is_idle_task(current))
+ set_tsk_need_resched(current);
}
/*
--
1.7.9.5
^ permalink raw reply related [flat|nested] 18+ messages in thread[parent not found: <CAKnoXLw3DrBAxCUWEkXtvCTf+E1w0xTHJSiSUY6Qd6xHXeGaoQ@mail.gmail.com>]
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT [not found] ` <CAKnoXLw3DrBAxCUWEkXtvCTf+E1w0xTHJSiSUY6Qd6xHXeGaoQ@mail.gmail.com> @ 2014-08-14 10:53 ` Peter Zijlstra 2014-08-14 11:24 ` Liu, Chuansheng 2014-08-14 11:00 ` Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2014-08-14 10:53 UTC (permalink / raw) To: Daniel Lezcano Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty [-- Attachment #1: Type: text/plain, Size: 2566 bytes --] On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > Hi Chuansheng, > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > selection immediately after received the IPI interrupt. > > > > The code model is simply like below: > > { > > pm_qos_update_request(&pm_qos, C1 - 1); > > < == Here keep all cores at C0 > > ...; > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > < == Here some cores still stuck at C0 for 2-3s > > } > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > wake up the core, but when core is in poll idle state, the IPI interrupt > > can not break the polling loop. > > > > So here in the IPI callback interrupt, when currently the idle task is > > running, we need to forcedly set reschedule bit to break the polling loop, > > as for other non-polling idle state, IPI interrupt can break them directly, > > and setting reschedule bit has no harm for them too. > > > > With this fix, we saved about 30mV power in our android platform. > > > > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> > > --- > > drivers/cpuidle/cpuidle.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > > index ee9df5e..9e28a13 100644 > > --- a/drivers/cpuidle/cpuidle.c > > +++ b/drivers/cpuidle/cpuidle.c > > @@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register); > > > > static void smp_callback(void *v) > > { > > - /* we already woke the CPU up, nothing more to do */ > > + /* we already woke the CPU up, and when the corresponding > > + * CPU is at polling idle state, we need to set the sched > > + * bit to trigger reselect the new suitable C-state, it > > + * will be helpful for power. > > + */ > > + if (is_idle_task(current)) > > + set_tsk_need_resched(current); > > > > Mmh, shouldn't we inspect the polling flag instead ? Peter (Cc'ed) did some > changes around this and I think we should ask its opinion. I am not sure > this code won't make all cpu to return to the scheduler and go back to the > idle task. Yes, this is wrong.. Also cpuidle should not know about this, so this is very much the wrong place to go fix this. Lemme have a look. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 10:53 ` Peter Zijlstra @ 2014-08-14 11:24 ` Liu, Chuansheng 2014-08-14 13:13 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-14 11:24 UTC (permalink / raw) To: Peter Zijlstra, Daniel Lezcano Cc: Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Thursday, August 14, 2014 6:54 PM > To: Daniel Lezcano > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > > Hi Chuansheng, > > > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> > wrote: > > > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > > selection immediately after received the IPI interrupt. > > > > > > The code model is simply like below: > > > { > > > pm_qos_update_request(&pm_qos, C1 - 1); > > > < == Here keep all cores at C0 > > > ...; > > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > > < == Here some cores still stuck at C0 for 2-3s > > > } > > > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > > wake up the core, but when core is in poll idle state, the IPI interrupt > > > can not break the polling loop. > > > > > > So here in the IPI callback interrupt, when currently the idle task is > > > running, we need to forcedly set reschedule bit to break the polling loop, > > > as for other non-polling idle state, IPI interrupt can break them directly, > > > and setting reschedule bit has no harm for them too. > > > > > > With this fix, we saved about 30mV power in our android platform. > > > > > > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> > > > --- > > > drivers/cpuidle/cpuidle.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > > > index ee9df5e..9e28a13 100644 > > > --- a/drivers/cpuidle/cpuidle.c > > > +++ b/drivers/cpuidle/cpuidle.c > > > @@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register); > > > > > > static void smp_callback(void *v) > > > { > > > - /* we already woke the CPU up, nothing more to do */ > > > + /* we already woke the CPU up, and when the corresponding > > > + * CPU is at polling idle state, we need to set the sched > > > + * bit to trigger reselect the new suitable C-state, it > > > + * will be helpful for power. > > > + */ > > > + if (is_idle_task(current)) > > > + set_tsk_need_resched(current); > > > > > > > Mmh, shouldn't we inspect the polling flag instead ? Peter (Cc'ed) did some > > changes around this and I think we should ask its opinion. I am not sure > > this code won't make all cpu to return to the scheduler and go back to the > > idle task. > > Yes, this is wrong.. Also cpuidle should not know about this, so this is > very much the wrong place to go fix this. Lemme have a look. If inspecting the polling flag, we can not fix the race between poll_idle and smp_callback, since in poll_idle(), before set polling flag, if the smp_callback come in, then no resched bit set, after that, poll_idle() will do the polling action, without reselection immediately, it will bring power regression here. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 11:24 ` Liu, Chuansheng @ 2014-08-14 13:13 ` Peter Zijlstra 2014-08-14 14:10 ` Liu, Chuansheng 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2014-08-14 13:13 UTC (permalink / raw) To: Liu, Chuansheng Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K [-- Attachment #1: Type: text/plain, Size: 427 bytes --] On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > If inspecting the polling flag, we can not fix the race between poll_idle and smp_callback, > since in poll_idle(), before set polling flag, if the smp_callback come in, then no resched bit set, > after that, poll_idle() will do the polling action, without reselection immediately, it will bring power > regression here. -ENOPARSE. Is there a question there? [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 13:13 ` Peter Zijlstra @ 2014-08-14 14:10 ` Liu, Chuansheng 2014-08-14 14:17 ` Daniel Lezcano 0 siblings, 1 reply; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-14 14:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Thursday, August 14, 2014 9:13 PM > To: Liu, Chuansheng > Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > > If inspecting the polling flag, we can not fix the race between poll_idle and > smp_callback, > > since in poll_idle(), before set polling flag, if the smp_callback come in, then > no resched bit set, > > after that, poll_idle() will do the polling action, without reselection > immediately, it will bring power > > regression here. > > -ENOPARSE. Is there a question there? Lezcano suggest to inspect the polling flag, then code is like below: smp_callback() { if (polling_flag) set_resched_bit; } And the poll_idle code is like below: static int poll_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { local_irq_enable(); if (!current_set_polling_and_test()) { while (!need_resched()) cpu_relax(); } current_clr_polling(); return index; } The race is: Idle task: poll_idle local_irq_enable() <== IPI interrupt coming, check the polling flag is not set yet, do nothing; Come back to poll_idle, it will stay in the poll loop for a while, instead break it immediately to let governor reselect the right C-state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 14:10 ` Liu, Chuansheng @ 2014-08-14 14:17 ` Daniel Lezcano 2014-08-14 14:26 ` Liu, Chuansheng 0 siblings, 1 reply; 18+ messages in thread From: Daniel Lezcano @ 2014-08-14 14:17 UTC (permalink / raw) To: Liu, Chuansheng, Peter Zijlstra Cc: Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K On 08/14/2014 04:10 PM, Liu, Chuansheng wrote: > > >> -----Original Message----- >> From: Peter Zijlstra [mailto:peterz@infradead.org] >> Sent: Thursday, August 14, 2014 9:13 PM >> To: Liu, Chuansheng >> Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, >> Changcheng; Wang, Xiaoming; Chakravarty, Souvik K >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS >> back to DEFAULT >> >> On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: >>> If inspecting the polling flag, we can not fix the race between poll_idle and >> smp_callback, >>> since in poll_idle(), before set polling flag, if the smp_callback come in, then >> no resched bit set, >>> after that, poll_idle() will do the polling action, without reselection >> immediately, it will bring power >>> regression here. >> >> -ENOPARSE. Is there a question there? > > Lezcano suggest to inspect the polling flag, then code is like below: > smp_callback() { > if (polling_flag) > set_resched_bit; > } > > And the poll_idle code is like below: > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) Or alternatively, something like: while (!need_resched() || kickme) { ... } smp_callback() { kickme = 1; } kickme is a percpu variable and set to zero when exiting the 'enter' callback. So we don't mess with the polling flag, which is already a bit tricky. This patch is very straightforward to illustrate the idea. > cpu_relax(); > } > current_clr_polling(); > > return index; > } > > The race is: > Idle task: > poll_idle > local_irq_enable() > <== IPI interrupt coming, check the polling flag is not set yet, do nothing; > Come back to poll_idle, it will stay in the poll loop for a while, instead break > it immediately to let governor reselect the right C-state. > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 14:17 ` Daniel Lezcano @ 2014-08-14 14:26 ` Liu, Chuansheng 0 siblings, 0 replies; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-14 14:26 UTC (permalink / raw) To: Daniel Lezcano, Peter Zijlstra Cc: Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 10:17 PM > To: Liu, Chuansheng; Peter Zijlstra > Cc: Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, Changcheng; > Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On 08/14/2014 04:10 PM, Liu, Chuansheng wrote: > > > > > >> -----Original Message----- > >> From: Peter Zijlstra [mailto:peterz@infradead.org] > >> Sent: Thursday, August 14, 2014 9:13 PM > >> To: Liu, Chuansheng > >> Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > >> Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > >> back to DEFAULT > >> > >> On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > >>> If inspecting the polling flag, we can not fix the race between poll_idle and > >> smp_callback, > >>> since in poll_idle(), before set polling flag, if the smp_callback come in, then > >> no resched bit set, > >>> after that, poll_idle() will do the polling action, without reselection > >> immediately, it will bring power > >>> regression here. > >> > >> -ENOPARSE. Is there a question there? > > > > Lezcano suggest to inspect the polling flag, then code is like below: > > smp_callback() { > > if (polling_flag) > > set_resched_bit; > > } > > > > And the poll_idle code is like below: > > static int poll_idle(struct cpuidle_device *dev, > > struct cpuidle_driver *drv, int index) > > { > > local_irq_enable(); > > if (!current_set_polling_and_test()) { > > while (!need_resched()) > > Or alternatively, something like: > > while (!need_resched() || kickme) { > ... > } > > > smp_callback() > { > kickme = 1; > } > > kickme is a percpu variable and set to zero when exiting the 'enter' > callback. > > So we don't mess with the polling flag, which is already a bit tricky. > > This patch is very straightforward to illustrate the idea. > > > cpu_relax(); > > } > > current_clr_polling(); > > > > return index; > > } > > Thanks Lezcano, the new flag kickme sounds making things simple, will try to send one new patch to review:) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT [not found] ` <CAKnoXLw3DrBAxCUWEkXtvCTf+E1w0xTHJSiSUY6Qd6xHXeGaoQ@mail.gmail.com> 2014-08-14 10:53 ` Peter Zijlstra @ 2014-08-14 11:00 ` Peter Zijlstra 2014-08-14 11:14 ` Daniel Lezcano 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2014-08-14 11:00 UTC (permalink / raw) To: Daniel Lezcano Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty [-- Attachment #1: Type: text/plain, Size: 1426 bytes --] On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > Hi Chuansheng, > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > selection immediately after received the IPI interrupt. > > > > The code model is simply like below: > > { > > pm_qos_update_request(&pm_qos, C1 - 1); > > < == Here keep all cores at C0 > > ...; > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > < == Here some cores still stuck at C0 for 2-3s > > } > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > wake up the core, but when core is in poll idle state, the IPI interrupt > > can not break the polling loop. So seeing how you're from @intel.com I'm assuming you're using x86 here. I'm not seeing how this can be possible, MWAIT is interrupted by IPIs just fine, which means we'll fall out of the cpuidle_enter(), which means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). It will indeed not leave the cpu_idle_loop() function and go right back into cpuidle_idle_call(), but that will then call cpuidle_select() which should pick a new C state. So the interrupt _should_ work. If it doesn't you need to explain why. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 11:00 ` Peter Zijlstra @ 2014-08-14 11:14 ` Daniel Lezcano 2014-08-14 11:17 ` Liu, Chuansheng ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Daniel Lezcano @ 2014-08-14 11:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: >> Hi Chuansheng, >> >> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: >> >>> We found sometimes even after we let PM_QOS back to DEFAULT, >>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state >>> selection immediately after received the IPI interrupt. >>> >>> The code model is simply like below: >>> { >>> pm_qos_update_request(&pm_qos, C1 - 1); >>> < == Here keep all cores at C0 >>> ...; >>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); >>> < == Here some cores still stuck at C0 for 2-3s >>> } >>> >>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to >>> wake up the core, but when core is in poll idle state, the IPI interrupt >>> can not break the polling loop. > > So seeing how you're from @intel.com I'm assuming you're using x86 here. > > I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > just fine, which means we'll fall out of the cpuidle_enter(), which > means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > It will indeed not leave the cpu_idle_loop() function and go right back > into cpuidle_idle_call(), but that will then call cpuidle_select() which > should pick a new C state. > > So the interrupt _should_ work. If it doesn't you need to explain why. I think the issue is related to the poll_idle state, in drivers/cpuidle/driver.c. This state is x86 specific and inserted in the cpuidle table as the state 0 (POLL). There is no mwait for this state. It is a bit confusing because this state is not listed in the acpi / intel idle driver but inserted implicitly at the beginning of the idle table by the cpuidle framework when the driver is registered. static int poll_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { local_irq_enable(); if (!current_set_polling_and_test()) { while (!need_resched()) cpu_relax(); } current_clr_polling(); return index; } -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 11:14 ` Daniel Lezcano @ 2014-08-14 11:17 ` Liu, Chuansheng 2014-08-14 12:41 ` Peter Zijlstra 2014-08-14 21:12 ` Andy Lutomirski 2 siblings, 0 replies; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-14 11:17 UTC (permalink / raw) To: Daniel Lezcano, Peter Zijlstra Cc: Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 7:15 PM > To: Peter Zijlstra > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > >> Hi Chuansheng, > >> > >> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> > wrote: > >> > >>> We found sometimes even after we let PM_QOS back to DEFAULT, > >>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > >>> selection immediately after received the IPI interrupt. > >>> > >>> The code model is simply like below: > >>> { > >>> pm_qos_update_request(&pm_qos, C1 - 1); > >>> < == Here keep all cores at C0 > >>> ...; > >>> pm_qos_update_request(&pm_qos, > PM_QOS_DEFAULT_VALUE); > >>> < == Here some cores still stuck at C0 for 2-3s > >>> } > >>> > >>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > >>> wake up the core, but when core is in poll idle state, the IPI interrupt > >>> can not break the polling loop. > > > > So seeing how you're from @intel.com I'm assuming you're using x86 here. > > > > I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > > just fine, which means we'll fall out of the cpuidle_enter(), which > > means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > > > It will indeed not leave the cpu_idle_loop() function and go right back > > into cpuidle_idle_call(), but that will then call cpuidle_select() which > > should pick a new C state. > > > > So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. > It is a bit confusing because this state is not listed in the acpi / > intel idle driver but inserted implicitly at the beginning of the idle > table by the cpuidle framework when the driver is registered. Yes, I am talking about the poll_idle() function which didn't use the mwait, If we want the reselection happening immediately, we need to break the poll while loop with setting schedule bit, insteadly we didn't care if real re-schedule happening or not. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 11:14 ` Daniel Lezcano 2014-08-14 11:17 ` Liu, Chuansheng @ 2014-08-14 12:41 ` Peter Zijlstra 2014-08-14 13:29 ` Daniel Lezcano 2014-08-14 21:12 ` Andy Lutomirski 2 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2014-08-14 12:41 UTC (permalink / raw) To: Daniel Lezcano Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty [-- Attachment #1: Type: text/plain, Size: 2721 bytes --] On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote: > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > >>Hi Chuansheng, > >> > >>On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > >> > >>>We found sometimes even after we let PM_QOS back to DEFAULT, > >>>the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > >>>selection immediately after received the IPI interrupt. > >>> > >>>The code model is simply like below: > >>>{ > >>> pm_qos_update_request(&pm_qos, C1 - 1); > >>> < == Here keep all cores at C0 > >>> ...; > >>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > >>> < == Here some cores still stuck at C0 for 2-3s > >>>} > >>> > >>>The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > >>>wake up the core, but when core is in poll idle state, the IPI interrupt > >>>can not break the polling loop. > > > >So seeing how you're from @intel.com I'm assuming you're using x86 here. > > > >I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >just fine, which means we'll fall out of the cpuidle_enter(), which > >means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > > >It will indeed not leave the cpu_idle_loop() function and go right back > >into cpuidle_idle_call(), but that will then call cpuidle_select() which > >should pick a new C state. > > > >So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. It is > a bit confusing because this state is not listed in the acpi / intel idle > driver but inserted implicitly at the beginning of the idle table by the > cpuidle framework when the driver is registered. > > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) > cpu_relax(); > } > current_clr_polling(); > > return index; > } Ah, well, in that case there's a ton more broken than just this. kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty much expects to be called after each interrupt. Then again, not reflecting properly isn't really a problem, its not like not accounting interrupts is going to safe power much. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 12:41 ` Peter Zijlstra @ 2014-08-14 13:29 ` Daniel Lezcano 2014-08-14 13:57 ` Liu, Chuansheng 0 siblings, 1 reply; 18+ messages in thread From: Daniel Lezcano @ 2014-08-14 13:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty On 08/14/2014 02:41 PM, Peter Zijlstra wrote: > On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote: >> On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >>> On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: >>>> Hi Chuansheng, >>>> >>>> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: >>>> >>>>> We found sometimes even after we let PM_QOS back to DEFAULT, >>>>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state >>>>> selection immediately after received the IPI interrupt. >>>>> >>>>> The code model is simply like below: >>>>> { >>>>> pm_qos_update_request(&pm_qos, C1 - 1); >>>>> < == Here keep all cores at C0 >>>>> ...; >>>>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); >>>>> < == Here some cores still stuck at C0 for 2-3s >>>>> } >>>>> >>>>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to >>>>> wake up the core, but when core is in poll idle state, the IPI interrupt >>>>> can not break the polling loop. >>> >>> So seeing how you're from @intel.com I'm assuming you're using x86 here. >>> >>> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >>> just fine, which means we'll fall out of the cpuidle_enter(), which >>> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >>> >>> It will indeed not leave the cpu_idle_loop() function and go right back >>> into cpuidle_idle_call(), but that will then call cpuidle_select() which >>> should pick a new C state. >>> >>> So the interrupt _should_ work. If it doesn't you need to explain why. >> >> I think the issue is related to the poll_idle state, in >> drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> cpuidle table as the state 0 (POLL). There is no mwait for this state. It is >> a bit confusing because this state is not listed in the acpi / intel idle >> driver but inserted implicitly at the beginning of the idle table by the >> cpuidle framework when the driver is registered. >> >> static int poll_idle(struct cpuidle_device *dev, >> struct cpuidle_driver *drv, int index) >> { >> local_irq_enable(); >> if (!current_set_polling_and_test()) { >> while (!need_resched()) >> cpu_relax(); >> } >> current_clr_polling(); >> >> return index; >> } > > Ah, well, in that case there's a ton more broken than just this. > kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty > much expects to be called after each interrupt. Agree. > Then again, not reflecting properly isn't really a problem, its not like > not accounting interrupts is going to safe power much. I think the main issue here is to exit the poll_idle loop when an IPI is received. IIUC, there is a pm_qos user, perhaps a driver (Chuansheng can give more details), setting a very short latency, so the cpuidle framework choose a shallow state like the poll_idle and then the driver sets a bigger latency, leading to the IPI to wake all the cpus. As the CPUs are in the poll_idle, they don't exit until an event make them to exit the need_resched() loop (reschedule or whatever). This situation can let the CPUs to stand in the infinite loop several seconds while we are expecting them to exit the poll_idle and enter a deeper idle state, thus with an extra energy consumption. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 13:29 ` Daniel Lezcano @ 2014-08-14 13:57 ` Liu, Chuansheng 0 siblings, 0 replies; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-14 13:57 UTC (permalink / raw) To: Daniel Lezcano, Peter Zijlstra Cc: Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 9:30 PM > To: Peter Zijlstra > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > I think the main issue here is to exit the poll_idle loop when an IPI is > received. IIUC, there is a pm_qos user, perhaps a driver (Chuansheng can > give more details), setting a very short latency, so the cpuidle > framework choose a shallow state like the poll_idle and then the driver > sets a bigger latency, leading to the IPI to wake all the cpus. As the > CPUs are in the poll_idle, they don't exit until an event make them to > exit the need_resched() loop (reschedule or whatever). This situation > can let the CPUs to stand in the infinite loop several seconds while we > are expecting them to exit the poll_idle and enter a deeper idle state, > thus with an extra energy consumption. > Exactly, no function error here. But do not enter the deeper C-state will bring more power consumption, in some mp3 standby mode, even 10% power can be saved. And this is the patch's aim here. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 11:14 ` Daniel Lezcano 2014-08-14 11:17 ` Liu, Chuansheng 2014-08-14 12:41 ` Peter Zijlstra @ 2014-08-14 21:12 ` Andy Lutomirski 2014-08-14 21:16 ` Peter Zijlstra 2 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2014-08-14 21:12 UTC (permalink / raw) To: Daniel Lezcano, Peter Zijlstra Cc: Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> So seeing how you're from @intel.com I'm assuming you're using x86 here. >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> just fine, which means we'll fall out of the cpuidle_enter(), which >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> should pick a new C state. >> >> So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. > It is a bit confusing because this state is not listed in the acpi / > intel idle driver but inserted implicitly at the beginning of the idle > table by the cpuidle framework when the driver is registered. > > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) > cpu_relax(); > } > current_clr_polling(); > > return index; > } As the most recent person to have modified this function, and as an avowed hater of pointless IPIs, let me ask a rather different question: why are you sending IPIs at all? As of Linux 3.16, poll_idle actually supports the polling idle interface :) Can't you just do: if (set_nr_if_polling(rq->idle)) { trace_sched_wake_idle_without_ipi(cpu); } else { spin_lock_irqsave(&rq->lock, flags); if (rq->curr == rq->idle) smp_send_reschedule(cpu); // else the CPU wasn't idle; nothing to do raw_spin_unlock_irqrestore(&rq->lock, flags); } In the common case (wake from C0, i.e. polling idle), this will skip the IPI entirely unless you race with idle entry/exit, saving a few more precious electrons and all of the latency involved in poking the APIC registers. --Andy P.S. "30mV" in the patch description is presumably a typo. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 21:12 ` Andy Lutomirski @ 2014-08-14 21:16 ` Peter Zijlstra 2014-08-14 21:22 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2014-08-14 21:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Daniel Lezcano, Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty [-- Attachment #1: Type: text/plain, Size: 2650 bytes --] On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: > On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >> > >> So seeing how you're from @intel.com I'm assuming you're using x86 here. > >> > >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >> just fine, which means we'll fall out of the cpuidle_enter(), which > >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > >> > >> It will indeed not leave the cpu_idle_loop() function and go right back > >> into cpuidle_idle_call(), but that will then call cpuidle_select() which > >> should pick a new C state. > >> > >> So the interrupt _should_ work. If it doesn't you need to explain why. > > > > I think the issue is related to the poll_idle state, in > > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > > cpuidle table as the state 0 (POLL). There is no mwait for this state. > > It is a bit confusing because this state is not listed in the acpi / > > intel idle driver but inserted implicitly at the beginning of the idle > > table by the cpuidle framework when the driver is registered. > > > > static int poll_idle(struct cpuidle_device *dev, > > struct cpuidle_driver *drv, int index) > > { > > local_irq_enable(); > > if (!current_set_polling_and_test()) { > > while (!need_resched()) > > cpu_relax(); > > } > > current_clr_polling(); > > > > return index; > > } > > As the most recent person to have modified this function, and as an > avowed hater of pointless IPIs, let me ask a rather different question: > why are you sending IPIs at all? As of Linux 3.16, poll_idle actually > supports the polling idle interface :) > > Can't you just do: > > if (set_nr_if_polling(rq->idle)) { > trace_sched_wake_idle_without_ipi(cpu); > } else { > spin_lock_irqsave(&rq->lock, flags); > if (rq->curr == rq->idle) > smp_send_reschedule(cpu); > // else the CPU wasn't idle; nothing to do > raw_spin_unlock_irqrestore(&rq->lock, flags); > } > > In the common case (wake from C0, i.e. polling idle), this will skip the > IPI entirely unless you race with idle entry/exit, saving a few more > precious electrons and all of the latency involved in poking the APIC > registers. They could and they probably should, but that logic should _not_ live in the cpuidle driver. And as stated elsewhere in the thread; they also need to fix their kick_all_cpus_sync() usage, because that's similarly wrecked. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 21:16 ` Peter Zijlstra @ 2014-08-14 21:22 ` Andy Lutomirski 2014-08-15 1:21 ` Liu, Chuansheng 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2014-08-14 21:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Lezcano, Chuansheng Liu, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, changcheng.liu, xiaoming.wang, souvik.k.chakravarty On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> >> >> So seeing how you're from @intel.com I'm assuming you're using x86 here. >> >> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> >> just fine, which means we'll fall out of the cpuidle_enter(), which >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> >> should pick a new C state. >> >> >> >> So the interrupt _should_ work. If it doesn't you need to explain why. >> > >> > I think the issue is related to the poll_idle state, in >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. >> > It is a bit confusing because this state is not listed in the acpi / >> > intel idle driver but inserted implicitly at the beginning of the idle >> > table by the cpuidle framework when the driver is registered. >> > >> > static int poll_idle(struct cpuidle_device *dev, >> > struct cpuidle_driver *drv, int index) >> > { >> > local_irq_enable(); >> > if (!current_set_polling_and_test()) { >> > while (!need_resched()) >> > cpu_relax(); >> > } >> > current_clr_polling(); >> > >> > return index; >> > } >> >> As the most recent person to have modified this function, and as an >> avowed hater of pointless IPIs, let me ask a rather different question: >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually >> supports the polling idle interface :) >> >> Can't you just do: >> >> if (set_nr_if_polling(rq->idle)) { >> trace_sched_wake_idle_without_ipi(cpu); >> } else { >> spin_lock_irqsave(&rq->lock, flags); >> if (rq->curr == rq->idle) >> smp_send_reschedule(cpu); >> // else the CPU wasn't idle; nothing to do >> raw_spin_unlock_irqrestore(&rq->lock, flags); >> } >> >> In the common case (wake from C0, i.e. polling idle), this will skip the >> IPI entirely unless you race with idle entry/exit, saving a few more >> precious electrons and all of the latency involved in poking the APIC >> registers. > > They could and they probably should, but that logic should _not_ live in > the cpuidle driver. Sure. My point is that fixing the IPI handler is, I think, totally bogus, because the IPI API isn't the right way to do this at all. It would be straightforward to add a new function wake_if_idle(int cpu) to sched/core.c. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-14 21:22 ` Andy Lutomirski @ 2014-08-15 1:21 ` Liu, Chuansheng 2014-08-15 1:27 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Liu, Chuansheng @ 2014-08-15 1:21 UTC (permalink / raw) To: Andy Lutomirski, Peter Zijlstra Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K > -----Original Message----- > From: Andy Lutomirski [mailto:luto@amacapital.net] > Sent: Friday, August 15, 2014 5:23 AM > To: Peter Zijlstra > Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki; > linux-pm@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming; > Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> > wrote: > > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: > >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >> >> > >> >> So seeing how you're from @intel.com I'm assuming you're using x86 > here. > >> >> > >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >> >> just fine, which means we'll fall out of the cpuidle_enter(), which > >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > >> >> > >> >> It will indeed not leave the cpu_idle_loop() function and go right back > >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which > >> >> should pick a new C state. > >> >> > >> >> So the interrupt _should_ work. If it doesn't you need to explain why. > >> > > >> > I think the issue is related to the poll_idle state, in > >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. > >> > It is a bit confusing because this state is not listed in the acpi / > >> > intel idle driver but inserted implicitly at the beginning of the idle > >> > table by the cpuidle framework when the driver is registered. > >> > > >> > static int poll_idle(struct cpuidle_device *dev, > >> > struct cpuidle_driver *drv, int index) > >> > { > >> > local_irq_enable(); > >> > if (!current_set_polling_and_test()) { > >> > while (!need_resched()) > >> > cpu_relax(); > >> > } > >> > current_clr_polling(); > >> > > >> > return index; > >> > } > >> > >> As the most recent person to have modified this function, and as an > >> avowed hater of pointless IPIs, let me ask a rather different question: > >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually > >> supports the polling idle interface :) > >> > >> Can't you just do: > >> > >> if (set_nr_if_polling(rq->idle)) { > >> trace_sched_wake_idle_without_ipi(cpu); > >> } else { > >> spin_lock_irqsave(&rq->lock, flags); > >> if (rq->curr == rq->idle) > >> smp_send_reschedule(cpu); > >> // else the CPU wasn't idle; nothing to do > >> raw_spin_unlock_irqrestore(&rq->lock, flags); > >> } > >> > >> In the common case (wake from C0, i.e. polling idle), this will skip the > >> IPI entirely unless you race with idle entry/exit, saving a few more > >> precious electrons and all of the latency involved in poking the APIC > >> registers. > > > > They could and they probably should, but that logic should _not_ live in > > the cpuidle driver. > > Sure. My point is that fixing the IPI handler is, I think, totally > bogus, because the IPI API isn't the right way to do this at all. > > It would be straightforward to add a new function wake_if_idle(int > cpu) to sched/core.c. > Thanks Andy and Peter's suggestion, it will save some IPI things in case the cores are not in idle. There is one similar API in sched/core.c wake_up_idle_cpu(), then just need add one new common smp API: smp_wake_up_cpus() { for_each_online_cpu() wake_up_idle_cpu(); } Will try one patch for it. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT 2014-08-15 1:21 ` Liu, Chuansheng @ 2014-08-15 1:27 ` Andy Lutomirski 0 siblings, 0 replies; 18+ messages in thread From: Andy Lutomirski @ 2014-08-15 1:27 UTC (permalink / raw) To: Liu, Chuansheng Cc: Peter Zijlstra, Daniel Lezcano, Rafael J. Wysocki, linux-pm@vger.kernel.org, LKML, Liu, Changcheng, Wang, Xiaoming, Chakravarty, Souvik K On Thu, Aug 14, 2014 at 6:21 PM, Liu, Chuansheng <chuansheng.liu@intel.com> wrote: > > >> -----Original Message----- >> From: Andy Lutomirski [mailto:luto@amacapital.net] >> Sent: Friday, August 15, 2014 5:23 AM >> To: Peter Zijlstra >> Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki; >> linux-pm@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming; >> Chakravarty, Souvik K >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS >> back to DEFAULT >> >> On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> >> wrote: >> > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: >> >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: >> >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> >> >> >> >> So seeing how you're from @intel.com I'm assuming you're using x86 >> here. >> >> >> >> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> >> >> just fine, which means we'll fall out of the cpuidle_enter(), which >> >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> >> >> >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> >> >> should pick a new C state. >> >> >> >> >> >> So the interrupt _should_ work. If it doesn't you need to explain why. >> >> > >> >> > I think the issue is related to the poll_idle state, in >> >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. >> >> > It is a bit confusing because this state is not listed in the acpi / >> >> > intel idle driver but inserted implicitly at the beginning of the idle >> >> > table by the cpuidle framework when the driver is registered. >> >> > >> >> > static int poll_idle(struct cpuidle_device *dev, >> >> > struct cpuidle_driver *drv, int index) >> >> > { >> >> > local_irq_enable(); >> >> > if (!current_set_polling_and_test()) { >> >> > while (!need_resched()) >> >> > cpu_relax(); >> >> > } >> >> > current_clr_polling(); >> >> > >> >> > return index; >> >> > } >> >> >> >> As the most recent person to have modified this function, and as an >> >> avowed hater of pointless IPIs, let me ask a rather different question: >> >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually >> >> supports the polling idle interface :) >> >> >> >> Can't you just do: >> >> >> >> if (set_nr_if_polling(rq->idle)) { >> >> trace_sched_wake_idle_without_ipi(cpu); >> >> } else { >> >> spin_lock_irqsave(&rq->lock, flags); >> >> if (rq->curr == rq->idle) >> >> smp_send_reschedule(cpu); >> >> // else the CPU wasn't idle; nothing to do >> >> raw_spin_unlock_irqrestore(&rq->lock, flags); >> >> } >> >> >> >> In the common case (wake from C0, i.e. polling idle), this will skip the >> >> IPI entirely unless you race with idle entry/exit, saving a few more >> >> precious electrons and all of the latency involved in poking the APIC >> >> registers. >> > >> > They could and they probably should, but that logic should _not_ live in >> > the cpuidle driver. >> >> Sure. My point is that fixing the IPI handler is, I think, totally >> bogus, because the IPI API isn't the right way to do this at all. >> >> It would be straightforward to add a new function wake_if_idle(int >> cpu) to sched/core.c. >> > Thanks Andy and Peter's suggestion, it will save some IPI things in case the cores are not > in idle. This isn't quite right. Using the polling interface correctly will save IPIs in case the core *is* idle. But, given that you are trying to upgrade the chosen idle state, I don't think you need to kick non-idle CPUs at all, and my example contains that optimization. Presumably the function should be named something like wake_up_if_idle. > > There is one similar API in sched/core.c wake_up_idle_cpu(), > then just need add one new common smp API: > > smp_wake_up_cpus() { > for_each_online_cpu() > wake_up_idle_cpu(); > } > > Will try one patch for it. This will have lots of extra overhead if the cpu is *not* idle. I think my example will be a lot more efficient. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-08-15 1:27 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-14 2:11 [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT Chuansheng Liu
[not found] ` <CAKnoXLw3DrBAxCUWEkXtvCTf+E1w0xTHJSiSUY6Qd6xHXeGaoQ@mail.gmail.com>
2014-08-14 10:53 ` Peter Zijlstra
2014-08-14 11:24 ` Liu, Chuansheng
2014-08-14 13:13 ` Peter Zijlstra
2014-08-14 14:10 ` Liu, Chuansheng
2014-08-14 14:17 ` Daniel Lezcano
2014-08-14 14:26 ` Liu, Chuansheng
2014-08-14 11:00 ` Peter Zijlstra
2014-08-14 11:14 ` Daniel Lezcano
2014-08-14 11:17 ` Liu, Chuansheng
2014-08-14 12:41 ` Peter Zijlstra
2014-08-14 13:29 ` Daniel Lezcano
2014-08-14 13:57 ` Liu, Chuansheng
2014-08-14 21:12 ` Andy Lutomirski
2014-08-14 21:16 ` Peter Zijlstra
2014-08-14 21:22 ` Andy Lutomirski
2014-08-15 1:21 ` Liu, Chuansheng
2014-08-15 1:27 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox