* [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24 8:23 Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-24 8:23 UTC (permalink / raw)
To: rjw
Cc: daniel.lezcano, linux-pm, linuxppc-dev, anton, mpe, bsingharora,
Shreyas B. Prabhu
Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.
commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.
Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.
Fix this by replacing right shift by 10 with /1000 while calculating
last_residency.
Reported-by: Anton Blanchard <anton@samba.org>
Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v2
=============
- Fixing it in the cpuidle core code instead of driver code.
drivers/cpuidle/cpuidle.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..30d67a8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
local_irq_enable();
/*
- * local_clock() returns the time in nanosecond, let's shift
- * by 10 (divide by 1024) to have microsecond based time.
+ * local_clock() returns the time in nanosecond, let's
+ * divide by 1000 to have microsecond based time.
*/
- diff = (time_end - time_start) >> 10;
+ diff = (time_end - time_start) / 1000;
if (diff > INT_MAX)
diff = INT_MAX;
--
2.1.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 9:30 ` kbuild test robot
` (2 subsequent siblings)
3 siblings, 2 replies; 11+ messages in thread
From: David Laight @ 2016-06-24 9:00 UTC (permalink / raw)
To: 'Shreyas B. Prabhu', rjw@rjwysocki.net
Cc: linux-pm@vger.kernel.org, daniel.lezcano@linaro.org,
anton@samba.org, linuxppc-dev@lists.ozlabs.org
RnJvbTogU2hyZXlhcyBCLiBQcmFiaHUNCj4gU2VudDogMjQgSnVuZSAyMDE2IDA5OjI0DQo+DQo+
IFNub296ZSBpcyBhIHBvbGwgaWRsZSBzdGF0ZSBpbiBwb3dlcm52IGFuZCBwc2VyaWVzIHBsYXRm
b3Jtcy4gU25vb3plDQo+IGhhcyBhIHRpbWVvdXQgc28gdGhhdCBpZiBhIGNwdSBzdGF5cyBpbiBz
bm9vemUgZm9yIG1vcmUgdGhhbiB0YXJnZXQNCj4gcmVzaWRlbmN5IG9mIHRoZSBuZXh0IGF2YWls
YWJsZSBpZGxlIHN0YXRlLCB0aGVuIGl0IHdvdWxkIGV4aXQgdGhlcmVieQ0KPiBnaXZpbmcgY2hh
bmNlIHRvIHRoZSBjcHVpZGxlIGdvdmVybm9yIHRvIHJlLWV2YWx1YXRlIGFuZA0KPiBwcm9tb3Rl
IHRoZSBjcHUgdG8gYSBkZWVwZXIgaWRsZSBzdGF0ZS4gVGhlcmVmb3JlIHdoZW5ldmVyIHNub296
ZSBleGl0cw0KPiBkdWUgdG8gdGhpcyB0aW1lb3V0LCBpdHMgbGFzdF9yZXNpZGVuY3kgd2lsbCBi
ZSB0YXJnZXRfcmVzaWRlbmN5IG9mIG5leHQNCj4gZGVlcGVyIHN0YXRlLg0KPiANCj4gY29tbWl0
IGU5M2U1OWNlNWI4NSAoImNwdWlkbGU6IFJlcGxhY2Uga3RpbWVfZ2V0KCkgd2l0aCBsb2NhbF9j
bG9jaygpIikNCj4gY2hhbmdlZCB0aGUgbWF0aCBhcm91bmQgbGFzdF9yZXNpZGVuY3kgY2FsY3Vs
YXRpb24uIFNwZWNpZmljYWxseSwgd2hpbGUNCj4gY29udmVydGluZyBsYXN0X3Jlc2lkZW5jeSB2
YWx1ZSBmcm9tIG5hbm9zZWNvbmRzIHRvIG1pY3Jvc2Vjb25kcyBpdCBkb2VzDQo+IHJpZ2h0IHNo
aWZ0IGJ5IDEwLiBEdWUgdG8gdGhpcywgaW4gc25vb3plIHRpbWVvdXQgZXhpdCBzY2VuYXJpb3MN
Cj4gbGFzdF9yZXNpZGVuY3kgY2FsY3VsYXRlZCBpcyByb3VnaGx5IDIuMyUgbGVzcyB0aGFuIHRh
cmdldF9yZXNpZGVuY3kgb2YNCj4gbmV4dCBhdmFpbGFibGUgc3RhdGUuIFRoaXMgcGF0dGVybiBp
cyBwaWNrZWQgdXAgZ2V0X3R5cGljYWxfaW50ZXJ2YWwoKQ0KPiBpbiB0aGUgbWVudSBnb3Zlcm5v
ciBhbmQgdGhlcmVmb3JlIGV4cGVjdGVkX2ludGVydmFsIGluIG1lbnVfc2VsZWN0KCkgaXMNCj4g
ZnJlcXVlbnRseSBsZXNzIHRoYW4gdGhlIHRhcmdldF9yZXNpZGVuY3kgb2YgYW55IHN0YXRlIGJ1
dCBzbm9vemUuDQo+IA0KPiBEdWUgdG8gdGhpcyB3ZSBhcmUgZW50ZXJpbmcgc25vb3plIGF0IGEg
aGlnaGVyIHJhdGUsIHRoZXJlYnkgYWZmZWN0aW5nDQo+IHRoZSBzaW5nbGUgdGhyZWFkIHBlcmZv
cm1hbmNlLg0KPiANCj4gRml4IHRoaXMgYnkgcmVwbGFjaW5nIHJpZ2h0IHNoaWZ0IGJ5IDEwIHdp
dGggLzEwMDAgd2hpbGUgY2FsY3VsYXRpbmcNCj4gbGFzdF9yZXNpZGVuY3kuDQo+IA0KPiBSZXBv
cnRlZC1ieTogQW50b24gQmxhbmNoYXJkIDxhbnRvbkBzYW1iYS5vcmc+DQo+IEJpc2VjdGVkLWJ5
OiBTaGlscGFzcmkgRyBCaGF0IDxzaGlscGEuYmhhdEBsaW51eC52bmV0LmlibS5jb20+DQo+IFNp
Z25lZC1vZmYtYnk6IFNocmV5YXMgQi4gUHJhYmh1IDxzaHJleWFzQGxpbnV4LnZuZXQuaWJtLmNv
bT4NCj4gLS0tDQo+IENoYW5nZXMgaW4gdjINCj4gPT09PT09PT09PT09PQ0KPiAgLSBGaXhpbmcg
aXQgaW4gdGhlIGNwdWlkbGUgY29yZSBjb2RlIGluc3RlYWQgb2YgZHJpdmVyIGNvZGUuDQo+IA0K
PiAgZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYyB8IDYgKysrLS0tDQo+ICAxIGZpbGUgY2hhbmdl
ZCwgMyBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2Ry
aXZlcnMvY3B1aWRsZS9jcHVpZGxlLmMgYi9kcml2ZXJzL2NwdWlkbGUvY3B1aWRsZS5jDQo+IGlu
ZGV4IGE0ZDAwNTkuLjMwZDY3YTggMTAwNjQ0DQo+IC0tLSBhL2RyaXZlcnMvY3B1aWRsZS9jcHVp
ZGxlLmMNCj4gKysrIGIvZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYw0KPiBAQCAtMjE4LDEwICsy
MTgsMTAgQEAgaW50IGNwdWlkbGVfZW50ZXJfc3RhdGUoc3RydWN0IGNwdWlkbGVfZGV2aWNlICpk
ZXYsIHN0cnVjdCBjcHVpZGxlX2RyaXZlciAqZHJ2LA0KPiAgCQlsb2NhbF9pcnFfZW5hYmxlKCk7
DQo+IA0KPiAgCS8qDQo+IC0JICogbG9jYWxfY2xvY2soKSByZXR1cm5zIHRoZSB0aW1lIGluIG5h
bm9zZWNvbmQsIGxldCdzIHNoaWZ0DQo+IC0JICogYnkgMTAgKGRpdmlkZSBieSAxMDI0KSB0byBo
YXZlIG1pY3Jvc2Vjb25kIGJhc2VkIHRpbWUuDQo+ICsJICogbG9jYWxfY2xvY2soKSByZXR1cm5z
IHRoZSB0aW1lIGluIG5hbm9zZWNvbmQsIGxldCdzDQo+ICsJICogZGl2aWRlIGJ5IDEwMDAgdG8g
aGF2ZSBtaWNyb3NlY29uZCBiYXNlZCB0aW1lLg0KPiAgCSAqLw0KPiAtCWRpZmYgPSAodGltZV9l
bmQgLSB0aW1lX3N0YXJ0KSA+PiAxMDsNCj4gKwlkaWZmID0gKHRpbWVfZW5kIC0gdGltZV9zdGFy
dCkgLyAxMDAwOw0KPiAgCWlmIChkaWZmID4gSU5UX01BWCkNCj4gIAkJZGlmZiA9IElOVF9NQVg7
DQoNClRoZSBpbnRlbnQgb2YgdGhlID4+IDEwIHdhcyBwcm9iYWJseSB0byBhdm9pZCBhbiBleHBl
bnNpdmUgNjRiaXQgZGl2aWRlLg0KU28gbWF5YmUgc29tZXRoaW5nIGxpa2U6DQoJZGlmZiA9IHRp
bWVfZW5kIC0gdGltZV9zdGFydDsNCglpZiAoZGlmZiA+PSBJTlRfTUFYLzIpDQoJCWRpZmZfMzIg
PSBJTlRfTUFYLzIvMTAwMDsNCgllbHNlDQoJCWRpZmZfMzIgPSBkaWZmOw0KCQlkaWZmXzMyICs9
IGRpZmZfMzIgPj4gNjsNCgkJZGlmZl8zMiA+Pj0gMTA7DQoJfQ0KDQpBZGRpbmcgYW4gZXh0cmEg
MS8zMiBtYWtlcyB0aGUgZGl2aXNpb24gYnkgYmUgc29tZXRoaW5nIHNsaWdodGx5IGJlbG93IDEw
MDAuDQoNCglEYXZpZA0KDQo=
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 9:00 ` David Laight
@ 2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
1 sibling, 0 replies; 11+ messages in thread
From: Daniel Lezcano @ 2016-06-24 10:05 UTC (permalink / raw)
To: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net
Cc: linux-pm@vger.kernel.org, anton@samba.org,
linuxppc-dev@lists.ozlabs.org
On 06/24/2016 11:00 AM, David Laight wrote:
> From: Shreyas B. Prabhu
>> Sent: 24 June 2016 09:24
>>
>> Snooze is a poll idle state in powernv and pseries platforms. Snooze
>> has a timeout so that if a cpu stays in snooze for more than target
>> residency of the next available idle state, then it would exit thereby
>> giving chance to the cpuidle governor to re-evaluate and
>> promote the cpu to a deeper idle state. Therefore whenever snooze exits
>> due to this timeout, its last_residency will be target_residency of next
>> deeper state.
>>
>> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
>> changed the math around last_residency calculation. Specifically, while
>> converting last_residency value from nanoseconds to microseconds it does
>> right shift by 10. Due to this, in snooze timeout exit scenarios
>> last_residency calculated is roughly 2.3% less than target_residency of
>> next available state. This pattern is picked up get_typical_interval()
>> in the menu governor and therefore expected_interval in menu_select() is
>> frequently less than the target_residency of any state but snooze.
>>
>> Due to this we are entering snooze at a higher rate, thereby affecting
>> the single thread performance.
>>
>> Fix this by replacing right shift by 10 with /1000 while calculating
>> last_residency.
>>
>> Reported-by: Anton Blanchard <anton@samba.org>
>> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>> ---
>> Changes in v2
>> =============
>> - Fixing it in the cpuidle core code instead of driver code.
>>
>> drivers/cpuidle/cpuidle.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a4d0059..30d67a8 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>> local_irq_enable();
>>
>> /*
>> - * local_clock() returns the time in nanosecond, let's shift
>> - * by 10 (divide by 1024) to have microsecond based time.
>> + * local_clock() returns the time in nanosecond, let's
>> + * divide by 1000 to have microsecond based time.
>> */
>> - diff = (time_end - time_start) >> 10;
>> + diff = (time_end - time_start) / 1000;
do_div ?
>> if (diff > INT_MAX)
>> diff = INT_MAX;
>
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> diff = time_end - time_start;
> if (diff >= INT_MAX/2)
> diff_32 = INT_MAX/2/1000;
> else
> diff_32 = diff;
> diff_32 += diff_32 >> 6;
> diff_32 >>= 10;
> }
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
@ 2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 16:01 ` Shreyas B Prabhu
1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 10:11 UTC (permalink / raw)
To: linuxppc-dev
Cc: David Laight, 'Shreyas B. Prabhu', rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> diff = time_end - time_start;
> if (diff >= INT_MAX/2)
> diff_32 = INT_MAX/2/1000;
> else
> diff_32 = diff;
> diff_32 += diff_32 >> 6;
> diff_32 >>= 10;
> }
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.
Why not change the definition of the time to nanoseconds and update
the users accordingly?
I see that cpuidle_enter_state() writes to the last_residency value,
and it is read in three places: ladder_select_state(), menu_update()
and show_state_time() (for sysfs).
If those functions are called less often than cpuidle_enter_state(),
we could just move the division there. Since the divisor is constant,
do_div() can convert it into a multiply and shift, or we could use
your the code you suggest above, or use a 32-bit division most of
the time:
if (diff <= UINT_MAX)
diff_32 = (u32)diff / NSECS_PER_USEC;
else
diff_32 = div_u64(diff, NSECS_PER_USEC;
which gcc itself will turn into a multiplication or series of
shifts on CPUs on which that is faster.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 10:11 ` Arnd Bergmann
@ 2016-06-24 16:01 ` Shreyas B Prabhu
2016-06-24 19:43 ` Arnd Bergmann
0 siblings, 1 reply; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-24 16:01 UTC (permalink / raw)
To: Arnd Bergmann, linuxppc-dev
Cc: David Laight, rjw@rjwysocki.net, daniel.lezcano@linaro.org,
anton@samba.org, linux-pm@vger.kernel.org
On 06/24/2016 03:41 PM, Arnd Bergmann wrote:
> On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
>> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
>> So maybe something like:
>> diff = time_end - time_start;
>> if (diff >= INT_MAX/2)
>> diff_32 = INT_MAX/2/1000;
>> else
>> diff_32 = diff;
>> diff_32 += diff_32 >> 6;
>> diff_32 >>= 10;
>> }
>>
>> Adding an extra 1/32 makes the division by be something slightly below 1000.
>
> Why not change the definition of the time to nanoseconds and update
> the users accordingly?
>
> I see that cpuidle_enter_state() writes to the last_residency value,
> and it is read in three places: ladder_select_state(), menu_update()
> and show_state_time() (for sysfs).
>
Updating show_state_time() to convert ns to us gets bit messy since
show_state_##_name which is used for disable, usage and time simply
returns state_usage->_name. And state_usage->time updation happens in
cpuidle_enter_state() itself.
> If those functions are called less often than cpuidle_enter_state(),
> we could just move the division there. Since the divisor is constant,
> do_div() can convert it into a multiply and shift, or we could use
> your the code you suggest above, or use a 32-bit division most of
> the time:
>
> if (diff <= UINT_MAX)
> diff_32 = (u32)diff / NSECS_PER_USEC;
> else
> diff_32 = div_u64(diff, NSECS_PER_USEC;
>
> which gcc itself will turn into a multiplication or series of
> shifts on CPUs on which that is faster.
>
I'm not sure which division method of the three suggested here to use.
Does anyone have a strong preference?
--Shreyas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 16:01 ` Shreyas B Prabhu
@ 2016-06-24 19:43 ` Arnd Bergmann
2016-06-27 8:59 ` David Laight
0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2016-06-24 19:43 UTC (permalink / raw)
To: Shreyas B Prabhu
Cc: linuxppc-dev, David Laight, rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > If those functions are called less often than cpuidle_enter_state(),
> > we could just move the division there. Since the divisor is constant,
> > do_div() can convert it into a multiply and shift, or we could use
> > your the code you suggest above, or use a 32-bit division most of
> > the time:
> >
> > if (diff <= UINT_MAX)
> > diff_32 = (u32)diff / NSECS_PER_USEC;
> > else
> > diff_32 = div_u64(diff, NSECS_PER_USEC;
> >
> > which gcc itself will turn into a multiplication or series of
> > shifts on CPUs on which that is faster.
> >
> I'm not sure which division method of the three suggested here to use.
> Does anyone have a strong preference?
>
It depends on how accurate we want it and how long we expect
the times to be. The optimization for the 4.2 second cutoff
for doing a 32-bit division only makes sense if the majority
of the sleep times are below that.
Arnd
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 19:43 ` Arnd Bergmann
@ 2016-06-27 8:59 ` David Laight
2016-06-29 7:00 ` Shreyas B Prabhu
0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2016-06-27 8:59 UTC (permalink / raw)
To: 'Arnd Bergmann', Shreyas B Prabhu
Cc: linuxppc-dev@lists.ozlabs.org, rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
From: Arnd Bergmann
> Sent: 24 June 2016 20:43
> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > > If those functions are called less often than cpuidle_enter_state(),
> > > we could just move the division there. Since the divisor is constant,
> > > do_div() can convert it into a multiply and shift, or we could use
> > > your the code you suggest above, or use a 32-bit division most of
> > > the time:
> > >
> > > if (diff <=3D UINT_MAX)
> > > diff_32 =3D (u32)diff / NSECS_PER_USEC;
> > > else
> > > diff_32 =3D div_u64(diff, NSECS_PER_USEC;
> > >
> > > which gcc itself will turn into a multiplication or series of
> > > shifts on CPUs on which that is faster.
> > >
> > I'm not sure which division method of the three suggested here to use.
> > Does anyone have a strong preference?
> >
>=20
> It depends on how accurate we want it and how long we expect
> the times to be. The optimization for the 4.2 second cutoff
> for doing a 32-bit division only makes sense if the majority
> of the sleep times are below that.
It also depends if the code actually cares about the length of 'long' sleep=
s.
I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
result could be treated as 4.2 seconds without causing grief.
Actually the cost of a 64bit divide after a 4 second sleep will be noise.
OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-27 8:59 ` David Laight
@ 2016-06-29 7:00 ` Shreyas B Prabhu
0 siblings, 0 replies; 11+ messages in thread
From: Shreyas B Prabhu @ 2016-06-29 7:00 UTC (permalink / raw)
To: David Laight, 'Arnd Bergmann'
Cc: linuxppc-dev@lists.ozlabs.org, rjw@rjwysocki.net,
daniel.lezcano@linaro.org, anton@samba.org,
linux-pm@vger.kernel.org
On 06/27/2016 02:29 PM, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 24 June 2016 20:43
>> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
>>>> If those functions are called less often than cpuidle_enter_state(),
>>>> we could just move the division there. Since the divisor is constant,
>>>> do_div() can convert it into a multiply and shift, or we could use
>>>> your the code you suggest above, or use a 32-bit division most of
>>>> the time:
>>>>
>>>> if (diff <= UINT_MAX)
>>>> diff_32 = (u32)diff / NSECS_PER_USEC;
>>>> else
>>>> diff_32 = div_u64(diff, NSECS_PER_USEC;
>>>>
>>>> which gcc itself will turn into a multiplication or series of
>>>> shifts on CPUs on which that is faster.
>>>>
>>> I'm not sure which division method of the three suggested here to use.
>>> Does anyone have a strong preference?
>>>
>>
>> It depends on how accurate we want it and how long we expect
>> the times to be. The optimization for the 4.2 second cutoff
>> for doing a 32-bit division only makes sense if the majority
>> of the sleep times are below that.
>
> It also depends if the code actually cares about the length of 'long' sleeps.
> I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
> result could be treated as 4.2 seconds without causing grief.
>
> Actually the cost of a 64bit divide after a 4 second sleep will be noise.
> OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
>
Agreed. I'll use the code you suggested, with a small change-
Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6
since I want to err on the side of last_residency being more than actual.
And for long sleep cases, I'll use div_u64().
Thanks,
Shreyas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
@ 2016-06-24 9:30 ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 9:30 UTC (permalink / raw)
To: Shreyas B. Prabhu
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 900 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24898 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
2016-06-24 9:30 ` kbuild test robot
@ 2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 10:27 UTC (permalink / raw)
To: Shreyas B. Prabhu
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'
vim +242 drivers/cpuidle/cpuidle.c
56cfbf74a Colin Cross 2012-05-07 236 dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross 2012-05-07 237 } else {
56cfbf74a Colin Cross 2012-05-07 238 dev->last_residency = 0;
56cfbf74a Colin Cross 2012-05-07 239 }
56cfbf74a Colin Cross 2012-05-07 240
56cfbf74a Colin Cross 2012-05-07 241 return entered_state;
56cfbf74a Colin Cross 2012-05-07 @242 }
56cfbf74a Colin Cross 2012-05-07 243
56cfbf74a Colin Cross 2012-05-07 244 /**
907e30f1b Daniel Lezcano 2014-03-03 245 * cpuidle_select - ask the cpuidle framework to choose an idle state
:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state
:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16948 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] cpuidle: Fix last_residency division
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
` (2 preceding siblings ...)
2016-06-24 10:27 ` kbuild test robot
@ 2016-06-24 12:10 ` kbuild test robot
3 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-06-24 12:10 UTC (permalink / raw)
To: Shreyas B. Prabhu
Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
mpe, bsingharora, Shreyas B. Prabhu
[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]
Hi,
[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod'
vim +224 drivers/cpuidle/cpuidle.c
218 local_irq_enable();
219
220 /*
221 * local_clock() returns the time in nanosecond, let's
222 * divide by 1000 to have microsecond based time.
223 */
> 224 diff = (time_end - time_start) / 1000;
225 if (diff > INT_MAX)
226 diff = INT_MAX;
227
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23385 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-06-29 7:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24 9:00 ` David Laight
2016-06-24 10:05 ` Daniel Lezcano
2016-06-24 10:11 ` Arnd Bergmann
2016-06-24 16:01 ` Shreyas B Prabhu
2016-06-24 19:43 ` Arnd Bergmann
2016-06-27 8:59 ` David Laight
2016-06-29 7:00 ` Shreyas B Prabhu
2016-06-24 9:30 ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).