linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rafael@kernel.org>
Cc: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>,
	"'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>,
	"'Peter Zijlstra'" <peterz@infradead.org>,
	"'LKML'" <linux-kernel@vger.kernel.org>,
	"'Frederic Weisbecker'" <frederic@kernel.org>,
	"'Mel Gorman'" <mgorman@suse.de>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Chen, Hu'" <hu1.chen@intel.com>,
	"'Quentin Perret'" <quentin.perret@arm.com>,
	"'Linux PM'" <linux-pm@vger.kernel.org>,
	"'Giovanni Gherdovich'" <ggherdovich@suse.cz>
Subject: RE: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems
Date: Sun, 6 Oct 2019 07:46:32 -0700	[thread overview]
Message-ID: <000001d57c54$db31f8c0$9195ea40$@net> (raw)
In-Reply-To: <CAJZ5v0gu=rALS9ZLNMDT3cw_sT2m8XCKP6+AW3488x2Q9EXM3g@mail.gmail.com>

On 2019.10.01 02:32 Rafael J. Wysocki wrote:
> On Sun, Sep 29, 2019 at 6:05 PM Doug Smythies <dsmythies@telus.net> wrote:
>> On 2019.09.26 09:32 Doug Smythies wrote:
>>
>>> If the deepest idle state is disabled, the system
>>> can become somewhat unstable, with anywhere between no problem
>>> at all, to the occasional temporary jump using a lot more
>>> power for a few seconds, to a permanent jump using a lot more
>>> power continuously. I have been unable to isolate the exact
>>> test load conditions under which this will occur. However,
>>> temporarily disabling and then enabling other idle states
>>> seems to make for a somewhat repeatable test. It is important
>>> to note that the issue occurs with only ever disabling the deepest
>>> idle state, just not reliably.
>>>
>>> I want to know how you want to proceed before I do a bunch of
>>> regression testing.
>>
>> I did some regression testing anyhow, more to create and debug
>> a methodology than anything else.
>>
>>> On 2018.12.11 03:50 Rafael J. Wysocki wrote:
>>>
>>>> v7 -> v8:
>>>>  * Apply the selection rules to the idle deepest state as well as to
>>>>    the shallower ones (the deepest idle state was treated differently
>>>>    before by mistake).
>>>>  * Subtract 1/2 of the exit latency from the measured idle duration
>>>>    in teo_update() (instead of subtracting the entire exit latency).
>>>>    This makes the idle state selection be slightly more performance-
>>>>   oriented.
>>>
>>> I have isolated the issue to a subset of the v7 to v8 changes, however
>>> it was not the exit latency changes.
>>>
>>> The partial revert to V7 changes I made were (on top of 5.3):
>>
>> The further testing showed a problem or two with my partial teo-v7 reversion
>> (I call it teo-v12) under slightly different testing conditions.

Correction:
There was no problem with my partial reversion kernel (a.k.a. teo-v12). The problem
was confusion over which kernel I was actually running for whatever test.

>>
>> I also have a 5.3 based kernel with the current teo reverted and the entire
>> teo-v7 put in its place. I have yet to find a idle state disabled related issue
>> with this kernel.
>>
>> I'll come back to this thread at a later date with better details and test results.
>
> Thanks for this work!
>
> Please also note that there is a teo patch in 5.4-rc1 that may make a
> difference in principle.

Yes, actually this saga started from somewhere between kernel 5.3 and 5.4-rc1,
and did include those teo patches, which actually significantly increases the
probability of the issue occurring.

When the deepest idle state is disabled, and the all states search loop exits
normally, it might incorrectly re-evaluate a previous idle state previously
deemed not worthy of the check. This was introduced between teo development
versions 7 and 8. The fix is to move the code back inside the loop.
(I'll submit a patch in a day or two).

I do not think I stated it clearly before: The problem here is that some CPUs
seem to get stuck in idle state 0, and when they do power consumption spikes,
often by several hundred % and often indefinitely.

I made a hack job automated test:
Kernel	tests		fail rate
5.4-rc1	 6616		13.45%
5.3		 2376		 4.50%
5.3-teov7	12136		 0.00%  <<< teo.c reverted and teov7 put in its place.
5.4-rc1-ds	11168        0.00%  <<< proposed patch (> 7 hours test time)

Proposed patch (on top of kernel 5.4-rc1):

doug@s15:~/temp-k-git/linux$ git diff
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index b5a0e49..0502aa9 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -276,8 +276,22 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
                if (idx < 0)
                        idx = i; /* first enabled state */

-               if (s->target_residency > duration_us)
+               if (s->target_residency > duration_us){
+                       /*
+                        * If the "hits" metric of the idle state matching the sleep length is
+                        * greater than its "misses" metric, that is the one to use.  Otherwise,
+                        * it is more likely that one of the shallower states will match the
+                        * idle duration observed after wakeup, so take the one with the maximum
+                        * "early hits" metric, but if that cannot be determined, just use the
+                        * state selected so far.
+                        */
+                       if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses &&
+                           max_early_idx >= 0) {
+                               idx = max_early_idx;
+                               duration_us = drv->states[idx].target_residency;
+                       }
                        break;
+               }

                if (s->exit_latency > latency_req && constraint_idx > i)
                        constraint_idx = i;
@@ -293,20 +307,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
        }

        /*
-        * If the "hits" metric of the idle state matching the sleep length is
-        * greater than its "misses" metric, that is the one to use.  Otherwise,
-        * it is more likely that one of the shallower states will match the
-        * idle duration observed after wakeup, so take the one with the maximum
-        * "early hits" metric, but if that cannot be determined, just use the
-        * state selected so far.
-        */
-       if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses &&
-           max_early_idx >= 0) {
-               idx = max_early_idx;
-               duration_us = drv->states[idx].target_residency;
-       }
-
-       /*
         * If there is a latency constraint, it may be necessary to use a
         * shallower idle state than the one selected so far.
         */



  reply	other threads:[~2019-10-06 14:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-26 16:31 [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems Doug Smythies
2019-09-29 16:04 ` Doug Smythies
2019-10-01  9:31   ` Rafael J. Wysocki
2019-10-06 14:46     ` Doug Smythies [this message]
2019-10-06 15:34       ` Rafael J. Wysocki
2019-10-08  6:20         ` Doug Smythies
2019-10-08  9:51           ` Rafael J. Wysocki
2019-10-08 10:49             ` Rafael J. Wysocki
2019-10-08 23:19               ` Rafael J. Wysocki
2019-10-09 13:36                 ` Rafael J. Wysocki
2019-10-10  7:05                   ` Doug Smythies
2019-10-10  8:42                     ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2018-12-17  1:53 Doug Smythies
2018-12-17 11:59 ` Rafael J. Wysocki
2018-12-11 11:49 Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001d57c54$db31f8c0$9195ea40$@net' \
    --to=dsmythies@telus.net \
    --cc=daniel.lezcano@linaro.org \
    --cc=frederic@kernel.org \
    --cc=ggherdovich@suse.cz \
    --cc=hu1.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).