linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Shreyas B Prabhu <shreyas@linux.vnet.ibm.com>
To: Balbir Singh <bsingharora@gmail.com>, rjw@rjwysocki.net
Cc: linux-pm@vger.kernel.org, daniel.lezcano@linaro.org,
	anton@samba.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] cpuidle/powernv: Fix snooze timeout
Date: Thu, 23 Jun 2016 15:11:54 +0530	[thread overview]
Message-ID: <576BAEE2.4060202@linux.vnet.ibm.com> (raw)
In-Reply-To: <576BABC5.7020600@gmail.com>



On 06/23/2016 02:58 PM, Balbir Singh wrote:
> 
> 
> On 23/06/16 14:58, Shreyas B Prabhu wrote:
>>
>>
>> On 06/23/2016 05:18 AM, Balbir Singh wrote:
>>>
>>>
>>> On 23/06/16 05:36, Shreyas B. Prabhu wrote:
>>>> Snooze is a poll idle state in powernv and pseries platforms. Snooze
>>>> has a timeout so that if a cpu stays in snooze for more than target
>>>> residency of the next available idle state, then it would exit thereby
>>>> giving chance to the cpuidle governor to re-evaluate and
>>>> promote the cpu to a deeper idle state. Therefore whenever snooze exits
>>>> due to this timeout, its last_residency will be target_residency of next
>>>> deeper state.
>>>>
>>>> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
>>>> changed the math around last_residency calculation. Specifically, while
>>>> converting last_residency value from nanoseconds to microseconds it does
>>>> right shift by 10. Due to this, in snooze timeout exit scenarios
>>>> last_residency calculated is roughly 2.3% less than target_residency of
>>>> next available state. This pattern is picked up get_typical_interval()
>>>> in the menu governor and therefore expected_interval in menu_select() is
>>>> frequently less than the target_residency of any state but snooze.
>>>>
>>>> Due to this we are entering snooze at a higher rate, thereby affecting
>>>> the single thread performance.
>>>> Since the math around last_residency is not meant to be precise, fix this
>>>> issue setting snooze timeout to 105% of target_residency of next
>>>> available idle state.
>>>>
>>>> This also adds comment around why snooze timeout is necessary.
>>>>
>>>> Reported-by: Anton Blanchard <anton@samba.org>
>>>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>>>> ---
>>>>  drivers/cpuidle/cpuidle-powernv.c | 14 ++++++++++++++
>>>>  drivers/cpuidle/cpuidle-pseries.c | 13 +++++++++++++
>>>>  2 files changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
>>>> index e12dc30..5835491 100644
>>>> --- a/drivers/cpuidle/cpuidle-powernv.c
>>>> +++ b/drivers/cpuidle/cpuidle-powernv.c
>>>> @@ -268,10 +268,24 @@ static int powernv_idle_probe(void)
>>>>  		cpuidle_state_table = powernv_states;
>>>>  		/* Device tree can indicate more idle states */
>>>>  		max_idle_state = powernv_add_idle_states();
>>>> +
>>>> +		/*
>>>> +		 * Staying in snooze for a long period can degrade the
>>>> +		 * perfomance of the sibling cpus. Set timeout for snooze such
>>>> +		 * that if the cpu stays in snooze longer than target residency
>>>> +		 * of the next available idle state then exit from snooze. This
>>>> +		 * gives a chance to the cpuidle governor to re-evaluate and
>>>> +		 * promote it to deeper idle states.
>>>> +		 */
>>>>  		if (max_idle_state > 1) {
>>>>  			snooze_timeout_en = true;
>>>>  			snooze_timeout = powernv_states[1].target_residency *
>>>>  					 tb_ticks_per_usec;
>>>> +			/*
>>>> +			 * Give a 5% margin since target residency related math
>>>> +			 * is not precise in cpuidle core.
>>>> +			 */
>>>
>>> Is this due to the microsecond conversion mentioned above? It would be nice to
>>> have it in the comment. Does
>>>
>>> (powernv_states[1].target_residency + tb_ticks_per_usec) / tb_ticks_per_usec solve
>>> your rounding issues, assuming the issue is really rounding or maybe it is due
>>> to the shift by 10, could you please elaborate on what related math is not
>>> precise? That would explain to me why I missed understanding your changes.
>>>
>>>> +			snooze_timeout += snooze_timeout / 20;
>>>
>>> For now 5% is sufficient, but do you want to check to assert to check if
>>>
>>> snooze_timeout (in microseconds) / tb_ticks_per_usec > powernv_states[i].target_residency?
>>>
>>
>> This is not a rounding issue. As I mentioned in the commit message, this
>> is because of the last_residency calculation in cpuidle.c.
>> To elaborate, last residency calculation is done in the following way
>> after commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with
>> local_clock()") -
>>
>> cpuidle_enter_state()
>> {
>> 	[...]
>> 	time_start = local_clock();
>> 	[enter idle state]
>> 	time_end = local_clock();
>> 	/*
>>          * local_clock() returns the time in nanosecond, let's shift
>>          * by 10 (divide by 1024) to have microsecond based time.
>>          */
>>         diff = (time_end - time_start) >> 10;
>> 	[...]
>> 	dev->last_residency = (int) diff;
>> }
>>
>> Because of >>10 as opposed to /1000, last_residency is lesser by 2.3%
> 
> 
> This is still a rounding error but at a different site. I see we saved
> a division by doing a >> 10, but we added it right back by doing a /20
> later in the platform code. 

While a >> 10 is done at every idle exit, div by 20 is done once during
boot, so this doesn't negate the previous optimization.

> Shouldn't the rounding affect other
> platforms as well? Can't we fix it in cpuidle_enter_state(). 

This does affect all platforms, but I'm guessing no other place relied
on the precision of last_residency calculations.
Daniel can perhaps comment on this.


> Division
> by 1000 can be optimized if required (but rather not add that complexity).
> Thanks for patiently explaining this
> 
> Balbir
> 

  reply	other threads:[~2016-06-23  9:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-22 19:36 [PATCH] cpuidle/powernv: Fix snooze timeout Shreyas B. Prabhu
2016-06-22 22:49 ` Rafael J. Wysocki
2016-06-22 23:48 ` Balbir Singh
2016-06-23  4:58   ` Shreyas B Prabhu
2016-06-23  9:28     ` Balbir Singh
2016-06-23  9:41       ` Shreyas B Prabhu [this message]
2016-06-23  9:55         ` Balbir Singh
2016-06-23 10:01       ` Daniel Lezcano
2016-06-23 13:35         ` Shreyas B Prabhu
2016-06-23 14:36           ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576BAEE2.4060202@linux.vnet.ibm.com \
    --to=shreyas@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=bsingharora@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).