linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24  8:23 Shreyas B. Prabhu
  2016-06-24  9:00 ` David Laight
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-24  8:23 UTC (permalink / raw)
  To: rjw
  Cc: daniel.lezcano, linux-pm, linuxppc-dev, anton, mpe, bsingharora,
	Shreyas B. Prabhu

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by replacing right shift by 10 with /1000 while calculating
last_residency.

Reported-by: Anton Blanchard <anton@samba.org>
Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v2
=============
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..30d67a8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 		local_irq_enable();
 
 	/*
-	 * local_clock() returns the time in nanosecond, let's shift
-	 * by 10 (divide by 1024) to have microsecond based time.
+	 * local_clock() returns the time in nanosecond, let's
+	 * divide by 1000 to have microsecond based time.
 	 */
-	diff = (time_end - time_start) >> 10;
+	diff = (time_end - time_start) / 1000;
 	if (diff > INT_MAX)
 		diff = INT_MAX;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24  9:00 ` David Laight
  2016-06-24 10:05   ` Daniel Lezcano
  2016-06-24 10:11   ` Arnd Bergmann
  2016-06-24  9:30 ` kbuild test robot
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: David Laight @ 2016-06-24  9:00 UTC (permalink / raw)
  To: 'Shreyas B. Prabhu', rjw@rjwysocki.net
  Cc: linux-pm@vger.kernel.org, daniel.lezcano@linaro.org,
	anton@samba.org, linuxppc-dev@lists.ozlabs.org

RnJvbTogU2hyZXlhcyBCLiBQcmFiaHUNCj4gU2VudDogMjQgSnVuZSAyMDE2IDA5OjI0DQo+DQo+
IFNub296ZSBpcyBhIHBvbGwgaWRsZSBzdGF0ZSBpbiBwb3dlcm52IGFuZCBwc2VyaWVzIHBsYXRm
b3Jtcy4gU25vb3plDQo+IGhhcyBhIHRpbWVvdXQgc28gdGhhdCBpZiBhIGNwdSBzdGF5cyBpbiBz
bm9vemUgZm9yIG1vcmUgdGhhbiB0YXJnZXQNCj4gcmVzaWRlbmN5IG9mIHRoZSBuZXh0IGF2YWls
YWJsZSBpZGxlIHN0YXRlLCB0aGVuIGl0IHdvdWxkIGV4aXQgdGhlcmVieQ0KPiBnaXZpbmcgY2hh
bmNlIHRvIHRoZSBjcHVpZGxlIGdvdmVybm9yIHRvIHJlLWV2YWx1YXRlIGFuZA0KPiBwcm9tb3Rl
IHRoZSBjcHUgdG8gYSBkZWVwZXIgaWRsZSBzdGF0ZS4gVGhlcmVmb3JlIHdoZW5ldmVyIHNub296
ZSBleGl0cw0KPiBkdWUgdG8gdGhpcyB0aW1lb3V0LCBpdHMgbGFzdF9yZXNpZGVuY3kgd2lsbCBi
ZSB0YXJnZXRfcmVzaWRlbmN5IG9mIG5leHQNCj4gZGVlcGVyIHN0YXRlLg0KPiANCj4gY29tbWl0
IGU5M2U1OWNlNWI4NSAoImNwdWlkbGU6IFJlcGxhY2Uga3RpbWVfZ2V0KCkgd2l0aCBsb2NhbF9j
bG9jaygpIikNCj4gY2hhbmdlZCB0aGUgbWF0aCBhcm91bmQgbGFzdF9yZXNpZGVuY3kgY2FsY3Vs
YXRpb24uIFNwZWNpZmljYWxseSwgd2hpbGUNCj4gY29udmVydGluZyBsYXN0X3Jlc2lkZW5jeSB2
YWx1ZSBmcm9tIG5hbm9zZWNvbmRzIHRvIG1pY3Jvc2Vjb25kcyBpdCBkb2VzDQo+IHJpZ2h0IHNo
aWZ0IGJ5IDEwLiBEdWUgdG8gdGhpcywgaW4gc25vb3plIHRpbWVvdXQgZXhpdCBzY2VuYXJpb3MN
Cj4gbGFzdF9yZXNpZGVuY3kgY2FsY3VsYXRlZCBpcyByb3VnaGx5IDIuMyUgbGVzcyB0aGFuIHRh
cmdldF9yZXNpZGVuY3kgb2YNCj4gbmV4dCBhdmFpbGFibGUgc3RhdGUuIFRoaXMgcGF0dGVybiBp
cyBwaWNrZWQgdXAgZ2V0X3R5cGljYWxfaW50ZXJ2YWwoKQ0KPiBpbiB0aGUgbWVudSBnb3Zlcm5v
ciBhbmQgdGhlcmVmb3JlIGV4cGVjdGVkX2ludGVydmFsIGluIG1lbnVfc2VsZWN0KCkgaXMNCj4g
ZnJlcXVlbnRseSBsZXNzIHRoYW4gdGhlIHRhcmdldF9yZXNpZGVuY3kgb2YgYW55IHN0YXRlIGJ1
dCBzbm9vemUuDQo+IA0KPiBEdWUgdG8gdGhpcyB3ZSBhcmUgZW50ZXJpbmcgc25vb3plIGF0IGEg
aGlnaGVyIHJhdGUsIHRoZXJlYnkgYWZmZWN0aW5nDQo+IHRoZSBzaW5nbGUgdGhyZWFkIHBlcmZv
cm1hbmNlLg0KPiANCj4gRml4IHRoaXMgYnkgcmVwbGFjaW5nIHJpZ2h0IHNoaWZ0IGJ5IDEwIHdp
dGggLzEwMDAgd2hpbGUgY2FsY3VsYXRpbmcNCj4gbGFzdF9yZXNpZGVuY3kuDQo+IA0KPiBSZXBv
cnRlZC1ieTogQW50b24gQmxhbmNoYXJkIDxhbnRvbkBzYW1iYS5vcmc+DQo+IEJpc2VjdGVkLWJ5
OiBTaGlscGFzcmkgRyBCaGF0IDxzaGlscGEuYmhhdEBsaW51eC52bmV0LmlibS5jb20+DQo+IFNp
Z25lZC1vZmYtYnk6IFNocmV5YXMgQi4gUHJhYmh1IDxzaHJleWFzQGxpbnV4LnZuZXQuaWJtLmNv
bT4NCj4gLS0tDQo+IENoYW5nZXMgaW4gdjINCj4gPT09PT09PT09PT09PQ0KPiAgLSBGaXhpbmcg
aXQgaW4gdGhlIGNwdWlkbGUgY29yZSBjb2RlIGluc3RlYWQgb2YgZHJpdmVyIGNvZGUuDQo+IA0K
PiAgZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYyB8IDYgKysrLS0tDQo+ICAxIGZpbGUgY2hhbmdl
ZCwgMyBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2Ry
aXZlcnMvY3B1aWRsZS9jcHVpZGxlLmMgYi9kcml2ZXJzL2NwdWlkbGUvY3B1aWRsZS5jDQo+IGlu
ZGV4IGE0ZDAwNTkuLjMwZDY3YTggMTAwNjQ0DQo+IC0tLSBhL2RyaXZlcnMvY3B1aWRsZS9jcHVp
ZGxlLmMNCj4gKysrIGIvZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYw0KPiBAQCAtMjE4LDEwICsy
MTgsMTAgQEAgaW50IGNwdWlkbGVfZW50ZXJfc3RhdGUoc3RydWN0IGNwdWlkbGVfZGV2aWNlICpk
ZXYsIHN0cnVjdCBjcHVpZGxlX2RyaXZlciAqZHJ2LA0KPiAgCQlsb2NhbF9pcnFfZW5hYmxlKCk7
DQo+IA0KPiAgCS8qDQo+IC0JICogbG9jYWxfY2xvY2soKSByZXR1cm5zIHRoZSB0aW1lIGluIG5h
bm9zZWNvbmQsIGxldCdzIHNoaWZ0DQo+IC0JICogYnkgMTAgKGRpdmlkZSBieSAxMDI0KSB0byBo
YXZlIG1pY3Jvc2Vjb25kIGJhc2VkIHRpbWUuDQo+ICsJICogbG9jYWxfY2xvY2soKSByZXR1cm5z
IHRoZSB0aW1lIGluIG5hbm9zZWNvbmQsIGxldCdzDQo+ICsJICogZGl2aWRlIGJ5IDEwMDAgdG8g
aGF2ZSBtaWNyb3NlY29uZCBiYXNlZCB0aW1lLg0KPiAgCSAqLw0KPiAtCWRpZmYgPSAodGltZV9l
bmQgLSB0aW1lX3N0YXJ0KSA+PiAxMDsNCj4gKwlkaWZmID0gKHRpbWVfZW5kIC0gdGltZV9zdGFy
dCkgLyAxMDAwOw0KPiAgCWlmIChkaWZmID4gSU5UX01BWCkNCj4gIAkJZGlmZiA9IElOVF9NQVg7
DQoNClRoZSBpbnRlbnQgb2YgdGhlID4+IDEwIHdhcyBwcm9iYWJseSB0byBhdm9pZCBhbiBleHBl
bnNpdmUgNjRiaXQgZGl2aWRlLg0KU28gbWF5YmUgc29tZXRoaW5nIGxpa2U6DQoJZGlmZiA9IHRp
bWVfZW5kIC0gdGltZV9zdGFydDsNCglpZiAoZGlmZiA+PSBJTlRfTUFYLzIpDQoJCWRpZmZfMzIg
PSBJTlRfTUFYLzIvMTAwMDsNCgllbHNlDQoJCWRpZmZfMzIgPSBkaWZmOw0KCQlkaWZmXzMyICs9
IGRpZmZfMzIgPj4gNjsNCgkJZGlmZl8zMiA+Pj0gMTA7DQoJfQ0KDQpBZGRpbmcgYW4gZXh0cmEg
MS8zMiBtYWtlcyB0aGUgZGl2aXNpb24gYnkgYmUgc29tZXRoaW5nIHNsaWdodGx5IGJlbG93IDEw
MDAuDQoNCglEYXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
  2016-06-24  9:00 ` David Laight
@ 2016-06-24  9:30 ` kbuild test robot
  2016-06-24 10:27 ` kbuild test robot
  2016-06-24 12:10 ` kbuild test robot
  3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24  9:30 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24898 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  9:00 ` David Laight
@ 2016-06-24 10:05   ` Daniel Lezcano
  2016-06-24 10:11   ` Arnd Bergmann
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Lezcano @ 2016-06-24 10:05 UTC (permalink / raw)
  To: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net
  Cc: linux-pm@vger.kernel.org, anton@samba.org,
	linuxppc-dev@lists.ozlabs.org

On 06/24/2016 11:00 AM, David Laight wrote:
> From: Shreyas B. Prabhu
>> Sent: 24 June 2016 09:24
>>
>> Snooze is a poll idle state in powernv and pseries platforms. Snooze
>> has a timeout so that if a cpu stays in snooze for more than target
>> residency of the next available idle state, then it would exit thereby
>> giving chance to the cpuidle governor to re-evaluate and
>> promote the cpu to a deeper idle state. Therefore whenever snooze exits
>> due to this timeout, its last_residency will be target_residency of next
>> deeper state.
>>
>> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
>> changed the math around last_residency calculation. Specifically, while
>> converting last_residency value from nanoseconds to microseconds it does
>> right shift by 10. Due to this, in snooze timeout exit scenarios
>> last_residency calculated is roughly 2.3% less than target_residency of
>> next available state. This pattern is picked up get_typical_interval()
>> in the menu governor and therefore expected_interval in menu_select() is
>> frequently less than the target_residency of any state but snooze.
>>
>> Due to this we are entering snooze at a higher rate, thereby affecting
>> the single thread performance.
>>
>> Fix this by replacing right shift by 10 with /1000 while calculating
>> last_residency.
>>
>> Reported-by: Anton Blanchard <anton@samba.org>
>> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>> ---
>> Changes in v2
>> =============
>>   - Fixing it in the cpuidle core code instead of driver code.
>>
>>   drivers/cpuidle/cpuidle.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a4d0059..30d67a8 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>>   		local_irq_enable();
>>
>>   	/*
>> -	 * local_clock() returns the time in nanosecond, let's shift
>> -	 * by 10 (divide by 1024) to have microsecond based time.
>> +	 * local_clock() returns the time in nanosecond, let's
>> +	 * divide by 1000 to have microsecond based time.
>>   	 */
>> -	diff = (time_end - time_start) >> 10;
>> +	diff = (time_end - time_start) / 1000;

do_div ?

>>   	if (diff > INT_MAX)
>>   		diff = INT_MAX;
>
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> 	diff = time_end - time_start;
> 	if (diff >= INT_MAX/2)
> 		diff_32 = INT_MAX/2/1000;
> 	else
> 		diff_32 = diff;
> 		diff_32 += diff_32 >> 6;
> 		diff_32 >>= 10;
> 	}
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  9:00 ` David Laight
  2016-06-24 10:05   ` Daniel Lezcano
@ 2016-06-24 10:11   ` Arnd Bergmann
  2016-06-24 16:01     ` Shreyas B Prabhu
  1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 10:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net,
	daniel.lezcano@linaro.org, anton@samba.org,
	linux-pm@vger.kernel.org

On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
>         diff = time_end - time_start;
>         if (diff >= INT_MAX/2)
>                 diff_32 = INT_MAX/2/1000;
>         else
>                 diff_32 = diff;
>                 diff_32 += diff_32 >> 6;
>                 diff_32 >>= 10;
>         }
> 
> Adding an extra 1/32 makes the division by be something slightly below 1000.

Why not change the definition of the time to nanoseconds and update
the users accordingly?

I see that cpuidle_enter_state() writes to the last_residency value,
and it is read in three places: ladder_select_state(), menu_update()
and show_state_time() (for sysfs).

If those functions are called less often than cpuidle_enter_state(),
we could just move the division there. Since the divisor is constant,
do_div() can convert it into a multiply and shift, or we could use
your the code you suggest above, or use a 32-bit division most of
the time:

	if (diff <= UINT_MAX)
		diff_32 = (u32)diff / NSECS_PER_USEC;
	else
		diff_32 = div_u64(diff, NSECS_PER_USEC;

which gcc itself will turn into a multiplication or series of
shifts on CPUs on which that is faster.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
  2016-06-24  9:00 ` David Laight
  2016-06-24  9:30 ` kbuild test robot
@ 2016-06-24 10:27 ` kbuild test robot
  2016-06-24 12:10 ` kbuild test robot
  3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 10:27 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'

vim +242 drivers/cpuidle/cpuidle.c

56cfbf74a Colin Cross    2012-05-07  236  		dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross    2012-05-07  237  	} else {
56cfbf74a Colin Cross    2012-05-07  238  		dev->last_residency = 0;
56cfbf74a Colin Cross    2012-05-07  239  	}
56cfbf74a Colin Cross    2012-05-07  240  
56cfbf74a Colin Cross    2012-05-07  241  	return entered_state;
56cfbf74a Colin Cross    2012-05-07 @242  }
56cfbf74a Colin Cross    2012-05-07  243  
56cfbf74a Colin Cross    2012-05-07  244  /**
907e30f1b Daniel Lezcano 2014-03-03  245   * cpuidle_select - ask the cpuidle framework to choose an idle state

:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state

:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16948 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
                   ` (2 preceding siblings ...)
  2016-06-24 10:27 ` kbuild test robot
@ 2016-06-24 12:10 ` kbuild test robot
  3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 12:10 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod'

vim +224 drivers/cpuidle/cpuidle.c

   218			local_irq_enable();
   219	
   220		/*
   221		 * local_clock() returns the time in nanosecond, let's
   222		 * divide by 1000 to have microsecond based time.
   223		 */
 > 224		diff = (time_end - time_start) / 1000;
   225		if (diff > INT_MAX)
   226			diff = INT_MAX;
   227	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23385 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 10:11   ` Arnd Bergmann
@ 2016-06-24 16:01     ` Shreyas B Prabhu
  2016-06-24 19:43       ` Arnd Bergmann
  0 siblings, 1 reply; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-24 16:01 UTC (permalink / raw)
  To: Arnd Bergmann, linuxppc-dev
  Cc: David Laight, rjw@rjwysocki.net, daniel.lezcano@linaro.org,
	anton@samba.org, linux-pm@vger.kernel.org



On 06/24/2016 03:41 PM, Arnd Bergmann wrote:
> On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
>> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
>> So maybe something like:
>>         diff = time_end - time_start;
>>         if (diff >= INT_MAX/2)
>>                 diff_32 = INT_MAX/2/1000;
>>         else
>>                 diff_32 = diff;
>>                 diff_32 += diff_32 >> 6;
>>                 diff_32 >>= 10;
>>         }
>>
>> Adding an extra 1/32 makes the division by be something slightly below 1000.
> 
> Why not change the definition of the time to nanoseconds and update
> the users accordingly?
> 
> I see that cpuidle_enter_state() writes to the last_residency value,
> and it is read in three places: ladder_select_state(), menu_update()
> and show_state_time() (for sysfs).
> 

Updating show_state_time() to convert ns to us gets bit messy since
show_state_##_name which is used for disable, usage and time simply
returns state_usage->_name. And state_usage->time updation happens in
cpuidle_enter_state() itself.

> If those functions are called less often than cpuidle_enter_state(),
> we could just move the division there. Since the divisor is constant,
> do_div() can convert it into a multiply and shift, or we could use
> your the code you suggest above, or use a 32-bit division most of
> the time:
> 
> 	if (diff <= UINT_MAX)
> 		diff_32 = (u32)diff / NSECS_PER_USEC;
> 	else
> 		diff_32 = div_u64(diff, NSECS_PER_USEC;
> 
> which gcc itself will turn into a multiplication or series of
> shifts on CPUs on which that is faster.
>
I'm not sure which division method of the three suggested here to use.
Does anyone have a strong preference?

--Shreyas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 16:01     ` Shreyas B Prabhu
@ 2016-06-24 19:43       ` Arnd Bergmann
  2016-06-27  8:59         ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 19:43 UTC (permalink / raw)
  To: Shreyas B Prabhu
  Cc: linuxppc-dev, David Laight, rjw@rjwysocki.net,
	daniel.lezcano@linaro.org, anton@samba.org,
	linux-pm@vger.kernel.org

On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > If those functions are called less often than cpuidle_enter_state(),
> > we could just move the division there. Since the divisor is constant,
> > do_div() can convert it into a multiply and shift, or we could use
> > your the code you suggest above, or use a 32-bit division most of
> > the time:
> > 
> >       if (diff <= UINT_MAX)
> >               diff_32 = (u32)diff / NSECS_PER_USEC;
> >       else
> >               diff_32 = div_u64(diff, NSECS_PER_USEC;
> > 
> > which gcc itself will turn into a multiplication or series of
> > shifts on CPUs on which that is faster.
> >
> I'm not sure which division method of the three suggested here to use.
> Does anyone have a strong preference?
> 

It depends on how accurate we want it and how long we expect
the times to be. The optimization for the 4.2 second cutoff
for doing a 32-bit division only makes sense if the majority
of the sleep times are below that.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 19:43       ` Arnd Bergmann
@ 2016-06-27  8:59         ` David Laight
  2016-06-29  7:00           ` Shreyas B Prabhu
  0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2016-06-27  8:59 UTC (permalink / raw)
  To: 'Arnd Bergmann', Shreyas B Prabhu
  Cc: linuxppc-dev@lists.ozlabs.org, rjw@rjwysocki.net,
	daniel.lezcano@linaro.org, anton@samba.org,
	linux-pm@vger.kernel.org

From: Arnd Bergmann
> Sent: 24 June 2016 20:43
> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > > If those functions are called less often than cpuidle_enter_state(),
> > > we could just move the division there. Since the divisor is constant,
> > > do_div() can convert it into a multiply and shift, or we could use
> > > your the code you suggest above, or use a 32-bit division most of
> > > the time:
> > >
> > >       if (diff <=3D UINT_MAX)
> > >               diff_32 =3D (u32)diff / NSECS_PER_USEC;
> > >       else
> > >               diff_32 =3D div_u64(diff, NSECS_PER_USEC;
> > >
> > > which gcc itself will turn into a multiplication or series of
> > > shifts on CPUs on which that is faster.
> > >
> > I'm not sure which division method of the three suggested here to use.
> > Does anyone have a strong preference?
> >
>=20
> It depends on how accurate we want it and how long we expect
> the times to be. The optimization for the 4.2 second cutoff
> for doing a 32-bit division only makes sense if the majority
> of the sleep times are below that.

It also depends if the code actually cares about the length of 'long' sleep=
s.
I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
result could be treated as 4.2 seconds without causing grief.

Actually the cost of a 64bit divide after a 4 second sleep will be noise.
OTOH a 64bit divide after a sleep that lasted a few ns will be significant.

	David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-27  8:59         ` David Laight
@ 2016-06-29  7:00           ` Shreyas B Prabhu
  0 siblings, 0 replies; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-29  7:00 UTC (permalink / raw)
  To: David Laight, 'Arnd Bergmann'
  Cc: linuxppc-dev@lists.ozlabs.org, rjw@rjwysocki.net,
	daniel.lezcano@linaro.org, anton@samba.org,
	linux-pm@vger.kernel.org



On 06/27/2016 02:29 PM, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 24 June 2016 20:43
>> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
>>>> If those functions are called less often than cpuidle_enter_state(),
>>>> we could just move the division there. Since the divisor is constant,
>>>> do_div() can convert it into a multiply and shift, or we could use
>>>> your the code you suggest above, or use a 32-bit division most of
>>>> the time:
>>>>
>>>>       if (diff <= UINT_MAX)
>>>>               diff_32 = (u32)diff / NSECS_PER_USEC;
>>>>       else
>>>>               diff_32 = div_u64(diff, NSECS_PER_USEC;
>>>>
>>>> which gcc itself will turn into a multiplication or series of
>>>> shifts on CPUs on which that is faster.
>>>>
>>> I'm not sure which division method of the three suggested here to use.
>>> Does anyone have a strong preference?
>>>
>>
>> It depends on how accurate we want it and how long we expect
>> the times to be. The optimization for the 4.2 second cutoff
>> for doing a 32-bit division only makes sense if the majority
>> of the sleep times are below that.
> 
> It also depends if the code actually cares about the length of 'long' sleeps.
> I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
> result could be treated as 4.2 seconds without causing grief.
> 
> Actually the cost of a 64bit divide after a 4 second sleep will be noise.
> OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
> 
Agreed. I'll use the code you suggested, with a small change-
Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6
since I want to err on the side of last_residency being more than actual.

And for long sleep cases, I'll use div_u64().

Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-29  7:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24  9:00 ` David Laight
2016-06-24 10:05   ` Daniel Lezcano
2016-06-24 10:11   ` Arnd Bergmann
2016-06-24 16:01     ` Shreyas B Prabhu
2016-06-24 19:43       ` Arnd Bergmann
2016-06-27  8:59         ` David Laight
2016-06-29  7:00           ` Shreyas B Prabhu
2016-06-24  9:30 ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).