* [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24 8:23 Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-24 8:23 UTC (permalink / raw)
To: rjw
Cc: daniel.lezcano, linux-pm, linuxppc-dev, anton, mpe, bsingharora,
Shreyas B. Prabhu
Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.
commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.
Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.
Fix this by replacing right shift by 10 with /1000 while calculating
last_residency.
Reported-by: Anton Blanchard <anton@samba.org>
Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v2
=============
- Fixing it in the cpuidle core code instead of driver code.
drivers/cpuidle/cpuidle.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..30d67a8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
local_irq_enable();
/*
- * local_clock() returns the time in nanosecond, let's shift
- * by 10 (divide by 1024) to have microsecond based time.
+ * local_clock() returns the time in nanosecond, let's
+ * divide by 1000 to have microsecond based time.
*/
- diff = (time_end - time_start) >> 10;
+ diff = (time_end - time_start) / 1000;
if (diff > INT_MAX)
diff = INT_MAX;
--
2.1.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 9:30 ` kbuild test robot
` (2 subsequent siblings)
3 siblings, 2 replies; 11+ messages in thread
From: David Laight @ 2016-06-24 9:00 UTC (permalink / raw)
To: 'Shreyas B. Prabhu', rjw@rjwysocki.net
Cc: daniel.lezcano@linaro.org, linuxppc-dev@lists.ozlabs.org,
anton@samba.org, linux-pm@vger.kernel.org
From: Shreyas B. Prabhu
> Sent: 24 June 2016 09:24
>
> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
>
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
>
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
>
> Fix this by replacing right shift by 10 with /1000 while calculating
> last_residency.
>
> Reported-by: Anton Blanchard <anton@samba.org>
> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---
> Changes in v2
> =============
> - Fixing it in the cpuidle core code instead of driver code.
>
> drivers/cpuidle/cpuidle.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a4d0059..30d67a8 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> local_irq_enable();
>
> /*
> - * local_clock() returns the time in nanosecond, let's shift
> - * by 10 (divide by 1024) to have microsecond based time.
> + * local_clock() returns the time in nanosecond, let's
> + * divide by 1000 to have microsecond based time.
> */
> - diff = (time_end - time_start) >> 10;
> + diff = (time_end - time_start) / 1000;
> if (diff > INT_MAX)
> diff = INT_MAX;
The intent of the >> 10 was probably to avoid an expensive 64bit divide.
So maybe something like:
diff = time_end - time_start;
if (diff >= INT_MAX/2)
diff_32 = INT_MAX/2/1000;
else
diff_32 = diff;
diff_32 += diff_32 >> 6;
diff_32 >>= 10;
}
Adding an extra 1/32 makes the division by be something slightly below 1000.
David
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
@ 2016-06-24 9:30 ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 9:30 UTC (permalink / raw)
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 900 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24898 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 9:00 ` David Laight
@ 2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
1 sibling, 0 replies; 11+ messages in thread
From: Daniel Lezcano @ 2016-06-24 10:05 UTC (permalink / raw)
To: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net
Cc: linux-pm@vger.kernel.org, anton@samba.org,
linuxppc-dev@lists.ozlabs.org
On 06/24/2016 11:00 AM, David Laight wrote:
> From: Shreyas B. Prabhu
>> Sent: 24 June 2016 09:24
>>
>> Snooze is a poll idle state in powernv and pseries platforms. Snooze
>> has a timeout so that if a cpu stays in snooze for more than target
>> residency of the next available idle state, then it would exit thereby
>> giving chance to the cpuidle governor to re-evaluate and
>> promote the cpu to a deeper idle state. Therefore whenever snooze exits
>> due to this timeout, its last_residency will be target_residency of next
>> deeper state.
>>
>> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
>> changed the math around last_residency calculation. Specifically, while
>> converting last_residency value from nanoseconds to microseconds it does
>> right shift by 10. Due to this, in snooze timeout exit scenarios
>> last_residency calculated is roughly 2.3% less than target_residency of
>> next available state. This pattern is picked up get_typical_interval()
>> in the menu governor and therefore expected_interval in menu_select() is
>> frequently less than the target_residency of any state but snooze.
>>
>> Due to this we are entering snooze at a higher rate, thereby affecting
>> the single thread performance.
>>
>> Fix this by replacing right shift by 10 with /1000 while calculating
>> last_residency.
>>
>> Reported-by: Anton Blanchard <anton@samba.org>
>> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>> ---
>> Changes in v2
>> =============
>> - Fixing it in the cpuidle core code instead of driver code.
>>
>> drivers/cpuidle/cpuidle.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a4d0059..30d67a8 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>> local_irq_enable();
>>
>> /*
>> - * local_clock() returns the time in nanosecond, let's shift
>> - * by 10 (divide by 1024) to have microsecond based time.
>> + * local_clock() returns the time in nanosecond, let's
>> + * divide by 1000 to have microsecond based time.
>> */
>> - diff = (time_end - time_start) >> 10;
>> + diff = (time_end - time_start) / 1000;
do_div ?
>> if (diff > INT_MAX)
>> diff = INT_MAX;
>
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> diff = time_end - time_start;
> if (diff >= INT_MAX/2)
> diff_32 = INT_MAX/2/1000;
> else
> diff_32 = diff;
> diff_32 += diff_32 >> 6;
> diff_32 >>= 10;
> }
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
@ 2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 16:01 ` Shreyas B Prabhu
1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 10:11 UTC (permalink / raw)
To: linuxppc-dev
Cc: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> diff = time_end - time_start;
> if (diff >= INT_MAX/2)
> diff_32 = INT_MAX/2/1000;
> else
> diff_32 = diff;
> diff_32 += diff_32 >> 6;
> diff_32 >>= 10;
> }
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.
Why not change the definition of the time to nanoseconds and update
the users accordingly?
I see that cpuidle_enter_state() writes to the last_residency value,
and it is read in three places: ladder_select_state(), menu_update()
and show_state_time() (for sysfs).
If those functions are called less often than cpuidle_enter_state(),
we could just move the division there. Since the divisor is constant,
do_div() can convert it into a multiply and shift, or we could use
your the code you suggest above, or use a 32-bit division most of
the time:
if (diff <= UINT_MAX)
diff_32 = (u32)diff / NSECS_PER_USEC;
else
diff_32 = div_u64(diff, NSECS_PER_USEC;
which gcc itself will turn into a multiplication or series of
shifts on CPUs on which that is faster.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
2016-06-24 9:30 ` kbuild test robot
@ 2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 10:27 UTC (permalink / raw)
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'
vim +242 drivers/cpuidle/cpuidle.c
56cfbf74a Colin Cross 2012-05-07 236 dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross 2012-05-07 237 } else {
56cfbf74a Colin Cross 2012-05-07 238 dev->last_residency = 0;
56cfbf74a Colin Cross 2012-05-07 239 }
56cfbf74a Colin Cross 2012-05-07 240
56cfbf74a Colin Cross 2012-05-07 241 return entered_state;
56cfbf74a Colin Cross 2012-05-07 @242 }
56cfbf74a Colin Cross 2012-05-07 243
56cfbf74a Colin Cross 2012-05-07 244 /**
907e30f1b Daniel Lezcano 2014-03-03 245 * cpuidle_select - ask the cpuidle framework to choose an idle state
:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state
:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16948 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
` (2 preceding siblings ...)
2016-06-24 10:27 ` kbuild test robot
@ 2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 12:10 UTC (permalink / raw)
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod'
vim +224 drivers/cpuidle/cpuidle.c
218 local_irq_enable();
219
220 /*
221 * local_clock() returns the time in nanosecond, let's
222 * divide by 1000 to have microsecond based time.
223 */
> 224 diff = (time_end - time_start) / 1000;
225 if (diff > INT_MAX)
226 diff = INT_MAX;
227
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23385 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 10:11 ` Arnd Bergmann
@ 2016-06-24 16:01 ` Shreyas B Prabhu
2016-06-24 19:43 ` Arnd Bergmann
0 siblings, 1 reply; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-24 16:01 UTC (permalink / raw)
To: Arnd Bergmann, linuxppc-dev
Cc: David Laight, rjw@rjwysocki.net, daniel.lezcano@linaro.org,
anton@samba.org, linux-pm@vger.kernel.org
On 06/24/2016 03:41 PM, Arnd Bergmann wrote:
> On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
>> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
>> So maybe something like:
>> diff = time_end - time_start;
>> if (diff >= INT_MAX/2)
>> diff_32 = INT_MAX/2/1000;
>> else
>> diff_32 = diff;
>> diff_32 += diff_32 >> 6;
>> diff_32 >>= 10;
>> }
>>
>> Adding an extra 1/32 makes the division by be something slightly below 1000.
>
> Why not change the definition of the time to nanoseconds and update
> the users accordingly?
>
> I see that cpuidle_enter_state() writes to the last_residency value,
> and it is read in three places: ladder_select_state(), menu_update()
> and show_state_time() (for sysfs).
>
Updating show_state_time() to convert ns to us gets bit messy since
show_state_##_name which is used for disable, usage and time simply
returns state_usage->_name. And state_usage->time updation happens in
cpuidle_enter_state() itself.
> If those functions are called less often than cpuidle_enter_state(),
> we could just move the division there. Since the divisor is constant,
> do_div() can convert it into a multiply and shift, or we could use
> your the code you suggest above, or use a 32-bit division most of
> the time:
>
> if (diff <= UINT_MAX)
> diff_32 = (u32)diff / NSECS_PER_USEC;
> else
> diff_32 = div_u64(diff, NSECS_PER_USEC;
>
> which gcc itself will turn into a multiplication or series of
> shifts on CPUs on which that is faster.
>
I'm not sure which division method of the three suggested here to use.
Does anyone have a strong preference?
--Shreyas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 16:01 ` Shreyas B Prabhu
@ 2016-06-24 19:43 ` Arnd Bergmann
2016-06-27 8:59 ` David Laight
0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 19:43 UTC (permalink / raw)
To: Shreyas B Prabhu
Cc: linuxppc-dev, David Laight, rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > If those functions are called less often than cpuidle_enter_state(),
> > we could just move the division there. Since the divisor is constant,
> > do_div() can convert it into a multiply and shift, or we could use
> > your the code you suggest above, or use a 32-bit division most of
> > the time:
> >
> > if (diff <= UINT_MAX)
> > diff_32 = (u32)diff / NSECS_PER_USEC;
> > else
> > diff_32 = div_u64(diff, NSECS_PER_USEC;
> >
> > which gcc itself will turn into a multiplication or series of
> > shifts on CPUs on which that is faster.
> >
> I'm not sure which division method of the three suggested here to use.
> Does anyone have a strong preference?
>
It depends on how accurate we want it and how long we expect
the times to be. The optimization for the 4.2 second cutoff
for doing a 32-bit division only makes sense if the majority
of the sleep times are below that.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 19:43 ` Arnd Bergmann
@ 2016-06-27 8:59 ` David Laight
2016-06-29 7:00 ` Shreyas B Prabhu
0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2016-06-27 8:59 UTC (permalink / raw)
To: 'Arnd Bergmann', Shreyas B Prabhu
Cc: rjw@rjwysocki.net, daniel.lezcano@linaro.org,
linuxppc-dev@lists.ozlabs.org, anton@samba.org,
linux-pm@vger.kernel.org
From: Arnd Bergmann
> Sent: 24 June 2016 20:43
> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > > If those functions are called less often than cpuidle_enter_state(),
> > > we could just move the division there. Since the divisor is constant,
> > > do_div() can convert it into a multiply and shift, or we could use
> > > your the code you suggest above, or use a 32-bit division most of
> > > the time:
> > >
> > > if (diff <= UINT_MAX)
> > > diff_32 = (u32)diff / NSECS_PER_USEC;
> > > else
> > > diff_32 = div_u64(diff, NSECS_PER_USEC;
> > >
> > > which gcc itself will turn into a multiplication or series of
> > > shifts on CPUs on which that is faster.
> > >
> > I'm not sure which division method of the three suggested here to use.
> > Does anyone have a strong preference?
> >
>
> It depends on how accurate we want it and how long we expect
> the times to be. The optimization for the 4.2 second cutoff
> for doing a 32-bit division only makes sense if the majority
> of the sleep times are below that.
It also depends if the code actually cares about the length of 'long' sleeps.
I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
result could be treated as 4.2 seconds without causing grief.
Actually the cost of a 64bit divide after a 4 second sleep will be noise.
OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
David
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-27 8:59 ` David Laight
@ 2016-06-29 7:00 ` Shreyas B Prabhu
0 siblings, 0 replies; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-29 7:00 UTC (permalink / raw)
To: David Laight, 'Arnd Bergmann'
Cc: linuxppc-dev@lists.ozlabs.org, rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On 06/27/2016 02:29 PM, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 24 June 2016 20:43
>> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
>>>> If those functions are called less often than cpuidle_enter_state(),
>>>> we could just move the division there. Since the divisor is constant,
>>>> do_div() can convert it into a multiply and shift, or we could use
>>>> your the code you suggest above, or use a 32-bit division most of
>>>> the time:
>>>>
>>>> if (diff <= UINT_MAX)
>>>> diff_32 = (u32)diff / NSECS_PER_USEC;
>>>> else
>>>> diff_32 = div_u64(diff, NSECS_PER_USEC;
>>>>
>>>> which gcc itself will turn into a multiplication or series of
>>>> shifts on CPUs on which that is faster.
>>>>
>>> I'm not sure which division method of the three suggested here to use.
>>> Does anyone have a strong preference?
>>>
>>
>> It depends on how accurate we want it and how long we expect
>> the times to be. The optimization for the 4.2 second cutoff
>> for doing a 32-bit division only makes sense if the majority
>> of the sleep times are below that.
>
> It also depends if the code actually cares about the length of 'long' sleeps.
> I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
> result could be treated as 4.2 seconds without causing grief.
>
> Actually the cost of a 64bit divide after a 4 second sleep will be noise.
> OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
>
Agreed. I'll use the code you suggested, with a small change-
Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6
since I want to err on the side of last_residency being more than actual.
And for long sleep cases, I'll use div_u64().
Thanks,
Shreyas
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-06-29 7:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 16:01 ` Shreyas B Prabhu
2016-06-24 19:43 ` Arnd Bergmann
2016-06-27 8:59 ` David Laight
2016-06-29 7:00 ` Shreyas B Prabhu
2016-06-24 9:30 ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).