linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CPUs do not go idle - excessive energy consumption
@ 2017-10-12 15:28 Doug Smythies
  2017-10-12 15:52 ` Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Doug Smythies @ 2017-10-12 15:28 UTC (permalink / raw)
  To: linux-pm; +Cc: Doug Smythies

Hi all,

I am observing higher than nominal processor package power consumption, under
some conditions. The worst case, so far, was an extra 6.3 watts or 25%, however
more typically it is between 0 and 4 watts (over 1 minute sampling intervals).

The simplest use case found so far is a single threaded 100% load on one CPU.
The problem appears to be CPUs that should be idle and in a deep C state,
sometimes are not. It is as though they have been forgotten and only when they hit
the watchdog timer, or some other event, do they generally get sorted out. I have
looked at kernels back as far as 4.5, and the issue is always present.
  
I am looking for assistance investigating towards finding the root cause, because
I am unfamiliar with the idle code and have made little (O.K. no) progress on my own.

Details (for my test server, with an Intel i7-2600K, currently kernel 4.14-rc4):

In an otherwise idle system, with one processor loaded 100%, processor power
consumption is expected to be about 24 watts. However, sometimes power consumptions
of up to about 28 watts have been observed. Of course, the first thought is that the
extra power consumption is just due to some task, because the "otherwise idle" system
isn't really completely so. If that were true, we would expect the intel_pstate CPU
frequency scaling driver to be called often. It is not.

Although it is difficult to correlate higher power samples from turbostat, with
intel_pstate trace data, it seems they are due to occurrences of high load and long
durations. For a few years now, we would never expect to see high load without
high frequency calls to the driver. Also, typically for my test server, with many
services turned off, other tasks should take very little run time per time slice.
Examples (from trace data acquired and post processed with intel_pstate_tracer.py):

CPU 3: Load 87.2%; Duration 4000.091 msec; Comment watchdog.
CPU 5: Load 100%; Duration 2536.22 msec; Comment idle.
CPU 3: Load 100%; Duration 1184.027 msec; Comment idle.
(I have thousands more examples.)

Higher power consumption with no user load at all seems to be rather rare,
(Between 0 and 40 in one hour, arbitrary thresholds), and hard to detect via
turbostat output. However, if the system is booted with intel_idle.max_cstate=1,
then higher power consumption with no user load is very common, as are much wilder
variations on power consumption (from ~9 watts to ~32 watts).
For reference, the average rate is about 600 per hour with one 100% load, but
it does vary.

Note: While I tend to use taskset so as to know which CPU should be busy,
the issue also occurs when taskset is not used.

Note: I have arbitrarily set the threshold for this condition at >= 10% load
and >= 250 milliseconds duration (the time between calls to the intel_pstate
driver), and written a program to parse such samples out of the .csv files
generated by intel_pstate_tracer.py.

Note: This might be a separate problem, but the issue is made worse
by taking CPUs off-line and then bringing them back on-line (I think).

... Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPUs do not go idle - excessive energy consumption
  2017-10-12 15:28 CPUs do not go idle - excessive energy consumption Doug Smythies
@ 2017-10-12 15:52 ` Rafael J. Wysocki
  2017-10-12 16:36 ` Doug Smythies
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2017-10-12 15:52 UTC (permalink / raw)
  To: Doug Smythies; +Cc: Linux PM

Hi,

On Thu, Oct 12, 2017 at 5:28 PM, Doug Smythies <dsmythies@telus.net> wrote:
> Hi all,
>
> I am observing higher than nominal processor package power consumption, under
> some conditions. The worst case, so far, was an extra 6.3 watts or 25%, however
> more typically it is between 0 and 4 watts (over 1 minute sampling intervals).
>
> The simplest use case found so far is a single threaded 100% load on one CPU.
> The problem appears to be CPUs that should be idle and in a deep C state,
> sometimes are not. It is as though they have been forgotten and only when they hit
> the watchdog timer, or some other event, do they generally get sorted out. I have
> looked at kernels back as far as 4.5, and the issue is always present.
>
> I am looking for assistance investigating towards finding the root cause, because
> I am unfamiliar with the idle code and have made little (O.K. no) progress on my own.
>
> Details (for my test server, with an Intel i7-2600K, currently kernel 4.14-rc4):
>
> In an otherwise idle system, with one processor loaded 100%, processor power
> consumption is expected to be about 24 watts. However, sometimes power consumptions
> of up to about 28 watts have been observed. Of course, the first thought is that the
> extra power consumption is just due to some task, because the "otherwise idle" system
> isn't really completely so. If that were true, we would expect the intel_pstate CPU
> frequency scaling driver to be called often. It is not.
>
> Although it is difficult to correlate higher power samples from turbostat, with
> intel_pstate trace data, it seems they are due to occurrences of high load and long
> durations. For a few years now, we would never expect to see high load without
> high frequency calls to the driver. Also, typically for my test server, with many
> services turned off, other tasks should take very little run time per time slice.
> Examples (from trace data acquired and post processed with intel_pstate_tracer.py):
>
> CPU 3: Load 87.2%; Duration 4000.091 msec; Comment watchdog.
> CPU 5: Load 100%; Duration 2536.22 msec; Comment idle.
> CPU 3: Load 100%; Duration 1184.027 msec; Comment idle.
> (I have thousands more examples.)
>
> Higher power consumption with no user load at all seems to be rather rare,
> (Between 0 and 40 in one hour, arbitrary thresholds), and hard to detect via
> turbostat output. However, if the system is booted with intel_idle.max_cstate=1,
> then higher power consumption with no user load is very common, as are much wilder
> variations on power consumption (from ~9 watts to ~32 watts).
> For reference, the average rate is about 600 per hour with one 100% load, but
> it does vary.
>
> Note: While I tend to use taskset so as to know which CPU should be busy,
> the issue also occurs when taskset is not used.
>
> Note: I have arbitrarily set the threshold for this condition at >= 10% load
> and >= 250 milliseconds duration (the time between calls to the intel_pstate
> driver), and written a program to parse such samples out of the .csv files
> generated by intel_pstate_tracer.py.
>
> Note: This might be a separate problem, but the issue is made worse
> by taking CPUs off-line and then bringing them back on-line (I think).

For starters, you can try to apply this patch
https://patchwork.kernel.org/patch/9866841/ and see if it makes any
difference.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: CPUs do not go idle - excessive energy consumption
  2017-10-12 15:28 CPUs do not go idle - excessive energy consumption Doug Smythies
  2017-10-12 15:52 ` Rafael J. Wysocki
@ 2017-10-12 16:36 ` Doug Smythies
  2017-10-13 14:10 ` Doug Smythies
  2017-10-20  0:16 ` Doug Smythies
  3 siblings, 0 replies; 5+ messages in thread
From: Doug Smythies @ 2017-10-12 16:36 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'; +Cc: 'Linux PM'

On 2017.10.12 08:53 Rafael J. Wysocki wrote:
> On Thu, Oct 12, 2017 at 5:28 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>
>> I am observing higher than nominal processor package power consumption, under
>> some conditions. The worst case, so far, was an extra 6.3 watts or 25%, however
>> more typically it is between 0 and 4 watts (over 1 minute sampling intervals).
>>
...[snip]...

> For starters, you can try to apply this patch
> https://patchwork.kernel.org/patch/9866841/ and see if it makes any
> difference.

Oh darn, I missed the importance and relevance of that e-mail (but I did get it),
which might have saved me a lot of time.
It sounds like it is exactly the same issue.
I'll try it as soon as I can either resolve the conflicts, or go back
and apply it to whatever kernel version would result in no conflicts.

I'll report back once I have some test results.

... Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: CPUs do not go idle - excessive energy consumption
  2017-10-12 15:28 CPUs do not go idle - excessive energy consumption Doug Smythies
  2017-10-12 15:52 ` Rafael J. Wysocki
  2017-10-12 16:36 ` Doug Smythies
@ 2017-10-13 14:10 ` Doug Smythies
  2017-10-20  0:16 ` Doug Smythies
  3 siblings, 0 replies; 5+ messages in thread
From: Doug Smythies @ 2017-10-13 14:10 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'; +Cc: 'Linux PM'

On 2017.10.12 09:36 Doug Smythies wrote:
> On 2017.10.12 08:53 Rafael J. Wysocki wrote:
>> On Thu, Oct 12, 2017 at 5:28 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>>
>>> I am observing higher than nominal processor package power consumption, under
>>> some conditions. The worst case, so far, was an extra 6.3 watts or 25%, however
>>> more typically it is between 0 and 4 watts (over 1 minute sampling intervals).
>>>
> ...[snip]...
>
>> For starters, you can try to apply this patch
>> https://patchwork.kernel.org/patch/9866841/ and see if it makes any
>. difference.
>
> Oh darn, I missed the importance and relevance of that e-mail (but I did get it),
> which might have saved me a lot of time.
> It sounds like it is exactly the same issue.
> I'll try it as soon as I can either resolve the conflicts, or go back
> and apply it to whatever kernel version would result in no conflicts.
>
> I'll report back once I have some test results.

The patch did not solve the problem.

Because the test results vary, and always have, it is hard to know for
certain of there was some improvement, but I did have a couple of 10 minute
tests with 0 occurrences (based on my arbitrary thresholds). A one hour
test had 530 occurrences.

... Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: CPUs do not go idle - excessive energy consumption
  2017-10-12 15:28 CPUs do not go idle - excessive energy consumption Doug Smythies
                   ` (2 preceding siblings ...)
  2017-10-13 14:10 ` Doug Smythies
@ 2017-10-20  0:16 ` Doug Smythies
  3 siblings, 0 replies; 5+ messages in thread
From: Doug Smythies @ 2017-10-20  0:16 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'; +Cc: 'Linux PM'

On 2017.10.13 07:10 Doug Smythies wrote:
> On 2017.10.12 09:36 Doug Smythies wrote:
>> On 2017.10.12 08:53 Rafael J. Wysocki wrote:
>>> On Thu, Oct 12, 2017 at 5:28 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>>>
>>>> I am observing higher than nominal processor package power consumption, under
>>>> some conditions. The worst case, so far, was an extra 6.3 watts or 25%, however
>>>> more typically it is between 0 and 4 watts (over 1 minute sampling intervals).
>>>>
>> ...[snip]...
>>
>>> For starters, you can try to apply this patch
>>> https://patchwork.kernel.org/patch/9866841/ and see if it makes any
>> difference.
>>
>> Oh darn, I missed the importance and relevance of that e-mail (but I did get it),
>> which might have saved me a lot of time.
>> It sounds like it is exactly the same issue.
>> I'll try it as soon as I can either resolve the conflicts, or go back
>> and apply it to whatever kernel version would result in no conflicts.
>>
>> I'll report back once I have some test results.

I'll report back by replying to the above referenced e-mail.

> The patch did not solve the problem.

The patch did not work as it was sent.
The changes I had to make will also be in the above mentioned reply.

The patch still doesn't solve this particular problem.
Merely disabling idle state 0 does solve the problem.
I have been trying to demonstrate a downside to disabling
idle state 0, and haven't found one. pipe-test is something we used
before, but I don't see any difference on my computer.

I have reviewed and reminded myself about the issues with the menu governor.

> 
> Because the test results vary, and always have, it is hard to know for
> certain of there was some improvement, but I did have a couple of 10 minute
> tests with 0 occurrences (based on my arbitrary thresholds). A one hour
> test had 530 occurrences.
>
> ... Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-20  0:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-12 15:28 CPUs do not go idle - excessive energy consumption Doug Smythies
2017-10-12 15:52 ` Rafael J. Wysocki
2017-10-12 16:36 ` Doug Smythies
2017-10-13 14:10 ` Doug Smythies
2017-10-20  0:16 ` Doug Smythies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).