From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Linux PM list <linux-pm@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Len Brown <len.brown@intel.com>
Subject: Re: [PATCH] cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c
Date: Wed, 27 May 2015 21:49:54 +0530 [thread overview]
Message-ID: <5565EEAA.8010900@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJZ5v0hn__7KRqOTpBYAk8EfV28xD54G0MaetaJ6HH5uUE3wQw@mail.gmail.com>
On 05/27/2015 07:27 PM, Rafael J. Wysocki wrote:
> On Wed, May 27, 2015 at 2:25 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
>>>
>>> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
>>>>
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
>>>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
>>>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
>>>> entry in the cpuidle driver's table of states is overwritten with
>>>> the default "poll" entry by the core. The "state" defined by the
>>>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
>>>> callbacks and its exit_latency is 0.
>>>>
>>>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
>>>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
>>>> will be skipped by the loop) and in find_deepest_state() (since
>>>> exit_latency is 0, the "poll state" will become the default if the
>>>> "s->exit_latency <= latency_req" check is replaced with
>>>> "s->exit_latency < latency_req" which may only matter for drivers
>>>> providing different states with the same exit_latency).
>>>>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>> ---
>>>> drivers/cpuidle/cpuidle.c | 8 ++++----
>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> <snip>
>>>
>>>>
>>>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>>>> bool freeze)
>>>> {
>>>> unsigned int latency_req = 0;
>>>> - int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
>>>> + int i, ret = -ENXIO;
>>>>
>>>> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>>>> + for (i = 0; i < drv->state_count; i++) {
>>>> struct cpuidle_state *s = &drv->states[i];
>>>> struct cpuidle_state_usage *su = &dev->states_usage[i];
>>>>
>>>> - if (s->disabled || su->disable || s->exit_latency <=
>>>> latency_req
>>>> + if (s->disabled || su->disable || s->exit_latency <
>>>> latency_req
>>>
>>>
>>> Prior to this patch,
>>>
>>> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
>>> whose first idle state has an exit_latency of 0, find_deepest_state()
>>> would return -1 if it failed to find a deeper idle state.
>>> But as an effect of this patch, find_deepest_state() returns 0 in the
>>> above circumstance.
>>
>>
>> Except I am missing something, with an exit_latency = 0, the state will be
>> never selected, because of the "s->exit_latency < latency_req" condition
>> (strictly greater than).
>
> No, this is the condition to skip the state, so previously it wouldn't
> be selected, but after the patch it will.
>
> So yes, the patch changes behavior for systems with
> CONFIG_ARCH_HAS_CPU_RELAX unset.
>
>>> My concern is if these drivers do not intend to enter a polling state
>>> during suspend, this will cause an issue, won't it?
>
> The change in behavior happens for architectures where
> CONFIG_ARCH_HAS_CPU_RELAX is not set. In those cases the 0-index
> state is supposed to be provided by the driver. Is there a reason to
> expect that this may not be a genuine idle state?
On PowerPC, we have the 0-index idle state, whose exit_latency is 0 and
all that the CPU does in this state, is poll on need_resched(), except
at a lower priority from the hardware's standpoint. Nevertheless, the
CPU is busy polling. So, I would not consider it a genuine idle state.
On a side note, we do not yet support suspend on Power servers, but we
may in the future. Hence the concern.
>> Definitively poll can cause thermal issues, especially when suspending. It
>> is a dangerous state (let's imagine you close your laptop => suspend/poll
>> and then put it in your bag for a travel).
>
> With ARCH_HAS_CPU_RELAX set the "poll state" is supposed to be thermally safe.
>
>> I don't think with the code above we can reach this situation but I agree
>> this is something we have to take care carefully.
>>
>> Actually, I am in favour of removing poll at all from the cpuidle driver and
>> poll only when a cpuidle state selection fails under certain condition.
>>
>> So I fully agree with your statement below.
>>
>>> I would expect the cpus to be in a hardware
>>> defined idle state.
>
> Well, except for the degenerate case in which all of them are disabled
> and for the "broadcast timer stopping aviodance" use case for
> find_deepest_state().
During suspend, the CPUs can very well enter states where the timer
stops since we stop timer interrupts anyway. So unless the idle states
are explicitly disabled by the user/hardware for some reason, deeper
idle states will still be available during suspend as far as I can see.
>
> So there are two questions in my view:
> (1) Should find_deepest_state() ever return states with exit_latency equal to 0?
I would say no, since such an idle state would mostly be polling on a
wakeup event. Atleast, there is one such case in PowerPC.
> (2) If the answer to (1) is "yes", should the "poll state" be ever
> returned by find_deepest_state()?
>
> In any case, find_deepest_state() will only return a state with
> exit_latency equal to 0 if there's no other choice and if it returns
> nothing, we'll fall back to the architecture idle method. So the
> question really is whether or not falling back to arch idle is any
> better than using any state we have in the table.
My suggestion is to *not* fall back to arch idle code, since that is a
black box from the core cpuidle's perspective.
>
> The patch is based on the assumption that any state from the table
> will be better than arch idle, including the "polling state". If that
> does not hold, we'll need to rethink a couple of other things in my
> view.
We could fail suspend if a non-polling idle state is not available perhaps ?
Regards
Preeti U Murthy
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2015-05-27 16:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-27 1:36 [PATCH] cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c Rafael J. Wysocki
2015-05-27 8:22 ` Daniel Lezcano
2015-05-27 11:31 ` Preeti U Murthy
2015-05-27 12:25 ` Daniel Lezcano
2015-05-27 13:57 ` Rafael J. Wysocki
2015-05-27 16:19 ` Preeti U Murthy [this message]
2015-05-28 0:51 ` Rafael J. Wysocki
2015-05-28 2:09 ` [PATCH v2] " Rafael J. Wysocki
2015-05-28 1:58 ` Preeti U Murthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5565EEAA.8010900@linux.vnet.ibm.com \
--to=preeti@linux.vnet.ibm.com \
--cc=daniel.lezcano@linaro.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.