RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rafael@kernel.org>
Cc: "'Len Brown'" <lenb@kernel.org>,
	"'Yu Chen'" <yu.c.chen@intel.com>,
	"'Marcus Hähnel'" <mhaehnel@os.inf.tu-dresden.de>,
	"'Daniel Hackenberg'" <daniel.hackenberg@tu-dresden.de>,
	"'Robert Schöne'" <robert.schoene@tu-dresden.de>,
	mario.bielert@tu-dresden.de,
	"'Rafael J. Wysocki'" <rafael.j.wysocki@intel.com>,
	"'Alex Shi'" <alex.shi@linaro.org>,
	"'Ingo Molnar'" <mingo@kernel.org>,
	"'Rik van Riel'" <riel@redhat.com>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Nicholas Piggin'" <npiggin@gmail.com>,
	"'Linux PM'" <linux-pm@vger.kernel.org>,
	"'Peter Zijlstra'" <peterz@infradead.org>,
	"'Thomas Gleixner'" <tglx@linutronix.de>,
	"'Thomas Ilsche'" <thomas.ilsche@tu-dresden.de>,
	"Doug Smythies" <dsmythies@telus.net>
Subject: RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time
Date: Fri, 15 Dec 2017 08:16:08 -0800	[thread overview]
Message-ID: <001b01d375c0$07738c20$165aa460$@net> (raw)
In-Reply-To: Pqt7eQIAF7WlePqtGetBgu

On 2017.12.15 06:23 Rafael J. Wysocki wrote:
> On Fri, Dec 15, 2017 at 11:44 AM, Thomas Ilsche wrote:
>
>> we just saw very bad Powernightmares on a freshly installed dual socket 36
>> core

During the gap in this thread, I have seen some pretty bad ones also, but
not to the magnitude that you do.

> Can you please refrain from using marketing language for describing
> technical issues?

Ugh oh. I have just changed my web notes on this to use that term.

>> This issue is probably wasting a couple Megawatts globally,

I agree that there appears to be significant energy that can be saved.

>> I like to reinforce the offer
>> to help develop and test a patch,

Me also, however and as previously mentioned, I am stuck.

>> We acknowledge your suggestion to keep the scheduling ticks running instead
>> of using an additional fallback timer. Still, we need a solution with
>> manageable code changes without tightly coupling different software layers.
>> Unfortunately, the tick-timer is disabled early in the outer idle-loop in
>> do_idle(), but the C-state decision is taken in the governor.
>>
>> In your opinion, what is the best solution?
>> (a) re-enable the tick-timer? (not a very clean solution)
>> (b) moving the tick-timer disabling?
>
> Yes.
>
>> (could be very intrusive in terms of LOC,
>> we have no overview on which assumptions would break if we do this)
>
> Regardless, this is the long-term way to go IMO not just to address
> this issue, but also to avoid unneeded overhead of stopping the tick.
>
> The tick should only be stopped when we know for a fact that it is
> worth the overhead which is only when we know that the target
> residency of the idle state we want to enter is longer than the time
> to the next tick.  Note that this doesn't depend on the idle governor
> in use at all.
>
> And since this is the long-term way to go anyway, why don't we just do
> it right away?

Maybe that would also solve the C0 problem, I don't know.

>>
>> Regarding the C0/poll_idle problem, what is suitable as abort criterion:
>> (a) interrupt counts
>> (b) tick_nohz_get_sleep_length
>> (c) tick_sched->idle_expires
>
> Can you please remind me what that problem is exactly?

The C0 version of the problem is what I have been investigating for a long time
now. 

In an off-list e-mail, Yu claims to have found the root issue. I disagree that
it explains all the long poll_idle scenarios I have found. Obviously, I would
test any patch when available.

The problem is within the main loop of drivers/cpuidle/poll_state.c:

                while (!need_resched())
                        cpu_relax();

because not all isr's set the need_resched flag, and therefore it can end up
sitting in that loop for multiple seconds, instead of the expected few uSecs.

You may or may not recall that my main test scenario for this is a 100% load
on one CPU. Typically I observe something like for every unit of time
properly spent in idle state 0, the most energy costly idle state,
roughly 60,000 units of time are spent incorrectly in idle state 0, wasting
energy. Testing needs to done over hours, because the issue tends to come
in clusters being aggravated by some sort of beat frequencies
between events.

My proposed solution was this:

diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 7416b16..4d17d3d 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -5,16 +5,31 @@
  */

 #include <linux/cpuidle.h>
+#include <linux/tick.h>
 #include <linux/sched.h>
 #include <linux/sched/idle.h>

 static int __cpuidle poll_idle(struct cpuidle_device *dev,
                               struct cpuidle_driver *drv, int index)
 {
+       unsigned int next_timer_us, i;
+
        local_irq_enable();
        if (!current_set_polling_and_test()) {
-               while (!need_resched())
+               while (!need_resched()){
                        cpu_relax();
+
+                       /* Occasionally check for a new and long expected residency time. */
+                       if (!(i++ % 1024)) {
+                               local_irq_disable();
+                               next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
+                               local_irq_enable();
+                               /* need a better way to get threshold, including large margin */
+                               /* We are only trying to catch really bad cases here.         */
+                               if (next_timer_us > 100)
+                               break;
+                       }
+               }
        }
        current_clr_polling();

However,
I do not like that I am disabling and re-enabling local interrupts in some loop,
albeit only once per 1024 main loops. I do not know if there is another way to get
the same information without disabling interrupts.

I have not been able to figure out how to replace my "100" microseconds arbitrary
exit condition with some useful, machine specific exit criteria.

The root issue for me, is that I am not very familiar with the idle code.

... Doug

next prev parent reply	other threads:[~2017-12-15 16:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a181bf42-9462-476c-6dcd-39fc7151957f@tu-dresden.de>
2017-07-27 12:50 ` [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time Thomas Ilsche
2017-10-19  7:46   ` Len Brown
2017-10-20 16:31     ` Thomas Ilsche
2017-10-21 14:27     ` Doug Smythies
2017-10-20  0:17 ` Doug Smythies
2017-10-20 17:13   ` Thomas Ilsche
2017-10-21 14:28   ` Doug Smythies
2017-11-07 23:04     ` Thomas Ilsche
2017-11-08  4:53       ` Len Brown
2017-11-08  6:01         ` Yu Chen
2017-11-08 16:26         ` Doug Smythies
2017-11-08 16:26     ` Doug Smythies
2017-11-10 17:42     ` Doug Smythies
2017-11-14  6:12     ` Doug Smythies
2017-11-16 16:11       ` Thomas Ilsche
2017-11-16 22:47       ` Doug Smythies
2017-11-24 17:36       ` Doug Smythies
2017-12-02 12:56         ` Thomas Ilsche
2017-12-15 10:44           ` Thomas Ilsche
2017-12-15 14:23             ` Rafael J. Wysocki
2017-12-21  9:43               ` Thomas Ilsche
2017-12-22 19:37                 ` Rafael J. Wysocki
2017-12-15 16:16             ` Doug Smythies [this message]
2017-12-16  2:34               ` Rafael J. Wysocki
2017-11-25 16:30       ` Doug Smythies

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7416b16 dfblob:4d17d3d )
 OR (
bs:"RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001b01d375c0$07738c20$165aa460$@net' \
    --to=dsmythies@telus.net \
    --cc=alex.shi@linaro.org \
    --cc=daniel.hackenberg@tu-dresden.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=lenb@kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.bielert@tu-dresden.de \
    --cc=mhaehnel@os.inf.tu-dresden.de \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=riel@redhat.com \
    --cc=robert.schoene@tu-dresden.de \
    --cc=tglx@linutronix.de \
    --cc=thomas.ilsche@tu-dresden.de \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).