linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Doug Smythies" <dsmythies@telus.net>
To: 'Thomas Ilsche' <thomas.ilsche@tu-dresden.de>,
	'Yu Chen' <yu.c.chen@intel.com>
Cc: "'Marcus Hähnel'" <mhaehnel@os.inf.tu-dresden.de>,
	"'Daniel Hackenberg'" <daniel.hackenberg@tu-dresden.de>,
	"'Robert Schöne'" <robert.schoene@tu-dresden.de>,
	mario.bielert@tu-dresden.de,
	"'Rafael J. Wysocki'" <rafael.j.wysocki@intel.com>,
	"'Alex Shi'" <alex.shi@linaro.org>,
	"'Ingo Molnar'" <mingo@kernel.org>,
	"'Rik van Riel'" <riel@redhat.com>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Nicholas Piggin'" <npiggin@gmail.com>,
	linux-pm@vger.kernel.org, "'Len Brown'" <lenb@kernel.org>,
	"Doug Smythies" <dsmythies@telus.net>
Subject: RE: [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time
Date: Fri, 24 Nov 2017 09:36:28 -0800	[thread overview]
Message-ID: <000a01d3654a$c4996990$4dcc3cb0$@net> (raw)
In-Reply-To: FS1bekJUDC2CsFSwpebYuw

@Yu: among other things, off-list you asked for some benchmark data. See below.

@Thomas: I did several phoronix tests with a kernel with both your patch,
for idle states deeper than 0, and my patch, specific to idle state 0.
Every test was done with a stock 4.14 kernel and a patched kernel with:
The Thomas part disabled; The default Thomas setting of 10000 uSec
timeout; A Thomas setting of 1000 uSec timeout; Idle states 0,1,2,3
disabled (my system max is state 4). I have yet to find a good phoronix
test to demonstrate the idle states deeper than 0 improvements. There was
never a degradation (other than already listed below) due to your patch.

@All: I am just trying to get some baseline data here, I am not suggesting
either patch is in a final form. For my idle state 0 patch (further below)
I seek help to move the concept to a real robust patch.
All power/energy measurements were processor package power as
measured with turbostat.

Conclusion: The most significant phoronix test improvements are for single
threaded tests. This is not a surprise.

All test results listed are for the stock verses idle state 0 fix only:

Test 1: A contstant 100% load on one CPU: 23% less energy.

Phoronix compress-lzma: 15.2% less energy; 3.6% performance improvement.
Phoronix encode-mp3: 3.5% less energy; 1% performance improvement.
Phoronix himeno: 13.3% less energy; 5% performance improvement. (Note lots of test to test variability.)
Phoronix crafty: 4.6% less energy; 0.5% performance improvement.
Phoronix apache: 3% less energy; 3% performance improvement.
Phoronix sudokut: undetectable energy or performance change.
Phoronix iozone (1,1,1)(4Kb, 512MB, write): 3% less energy; undetectable performance change.
Phoronix mafft: 1.8% MORE energy; ~3.5% performance DEGRADATION. (investigation pending.)
Phoronix ffmpeg: undetectable energy or performance change.

Anticipated question: The energy improvements make sense, but
why the performance improvements?
Answer: Performance is actually slightly improved because when
idle state 0 powernightmares were running on other cores,
the maximum clock rate is reduced on my processor.
Excerpt from turbostat output:
35 * 100.0 = 3500.0 MHz max turbo 4 active cores
36 * 100.0 = 3600.0 MHz max turbo 3 active cores
37 * 100.0 = 3700.0 MHz max turbo 2 active cores
38 * 100.0 = 3800.0 MHz max turbo 1 active cores

On 2017.11.16 14:48 Doug Smythies wrote:
> On 2017.11.16 08:11 Thomas Ilsche wrote:
>
>>> Actually, the watchdog_timer_fn does set the "need_resched" condition, and will
>>> cause the state 0 idle to exit normally.
>>> 
>>> But yes, tick_sched_timer and a few others (for example: sched_rt_period_timer,
>>> clocksource_watchdog) do not set the "need_resched" condition, and, as you
>>> mentioned, will not cause the state 0 idle to exit as it should.
>>> 
>>> Conclusion: Currently the exit condition in drivers/cpuidle/poll_state.c
>>> is insufficient to guarantee proper operation.
>
> Or: Any interrupt out of the idle loop must return with "need_resched" 
>
>>> 
>>> This:
>>> 
>>> while (!need_resched())
>>> 
>>> is not enough.
>>
>> I may very well have mistakenly included watchdog_timer_fn in the list,
>> but as you describe it is inconsequential. If there are timers that do
>> not set need_resched, and that itself is not considered a bug, then
>> there should be another break condition.
>> I suppose it is a good idea
>> to differentiate between the need for rescheduling and the need to
>> be able to go in another sleep state.
>
> See patch below. I think both conditions are satisfied.
>
>> What do you think about the idea to use idle_expires?
>> Although on second thought that may have issues regarding accuracy /
>> race conditions with the interrupt timer.
>
> For a couples of days now, and with excellent results, I have
> been testing variations on the following theme:
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 7416b16..4d17d3d 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -5,16 +5,31 @@
>  */
>
> #include <linux/cpuidle.h>
> +#include <linux/tick.h>
> #include <linux/sched.h>
> #include <linux/sched/idle.h>
>
> static int __cpuidle poll_idle(struct cpuidle_device *dev,
>                               struct cpuidle_driver *drv, int index)
> {
> +       unsigned int next_timer_us, i;
> +
>        local_irq_enable();
>        if (!current_set_polling_and_test()) {
> -               while (!need_resched())
> +               while (!need_resched()){
>                        cpu_relax();
> +
> +                       /* Occasionally check for a new and long expected residency time. */
> +                       if (!(i++ % 1024)) {
> +                               local_irq_disable();
> +                               next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
> +                               local_irq_enable();
> +                               /* need a better way to get threshold, including large margin */
> +                               /* We are only trying to catch really bad cases here.         */
> +                               if (next_timer_us > 100)
> +                               break;
> +                       }
> +               }
>         }
>         current_clr_polling();
>
>
> Trace example 1:
>
>       9 [005] d...  1749.232242: cpu_idle: state=4 cpu_id=5
> 1055985 [005] d...  1750.288228: cpu_idle: state=4294967295 cpu_id=5
>       3 [005] d.h.  1750.288231: local_timer_entry: vector=239
>       1 [005] d.h.  1750.288233: local_timer_exit: vector=239
>       5 [005] d...  1750.288238: cpu_idle: state=0 cpu_id=5
>       0 [005] d.h.  1750.288238: local_timer_entry: vector=239
>       0 [005] d.h.  1750.288239: hrtimer_expire_entry: hrtimer=ffff91ca5f354880 function=tick_sched_timer now=1749980002791
>       3 [005] d.h.  1750.288242: hrtimer_expire_exit: hrtimer=ffff91ca5f354880
>       0 [005] d.h.  1750.288243: local_timer_exit: vector=239
>       1 [005] ..s.  1750.288244: timer_expire_entry: timer=ffffffffb4770ee0 function=__prandom_timer now=4295329792
>       4 [005] ..s.  1750.288249: timer_expire_exit: timer=ffffffffb4770ee0
>       5 [005] ....  1750.288254: cpu_idle: state=4294967295 cpu_id=5
>
> "need_resched" is not set, but the next timer is far off, so poll_state.c with the above patch now exits.
> And properly now decides to go into idle state 4, because nothing is going to happen for an eternity.
>
>       1 [005] d...  1750.288256: cpu_idle: state=4 cpu_id=5
> 2087982 [005] d...  1752.376239: cpu_idle: state=4294967295 cpu_id=5
>       3 [005] d.h.  1752.376242: local_timer_entry: vector=239
>       0 [005] d.h.  1752.376243: local_timer_exit: vector=239
>       5 [005] d...  1752.376248: cpu_idle: state=1 cpu_id=5
>      15 [005] d...  1752.376263: cpu_idle: state=4294967295 cpu_id=5
>       0 [005] d.h.  1752.376263: local_timer_entry: vector=239
>       0 [005] d.h.  1752.376264: hrtimer_expire_entry: hrtimer=ffff91ca5f354a00 function=watchdog_timer_fn now=1752068001621
>       3 [005] dNh.  1752.376268: hrtimer_expire_exit: hrtimer=ffff91ca5f354a00
>
>
> Trace example 2:
>
>       4 [000] d...  1792.272757: cpu_idle: state=0 cpu_id=0
>       1 [000] d.h.  1792.272758: local_timer_entry: vector=239
>       0 [000] d.h.  1792.272759: hrtimer_expire_entry: hrtimer=ffff91ca5f214880 function=tick_sched_timer now=1791964002768
>       3 [000] d.h.  1792.272762: hrtimer_expire_exit: hrtimer=ffff91ca5f214880
>       0 [000] d.h.  1792.272762: local_timer_exit: vector=239
>
> The next timer is very short, so the poll_state.c loop does not exit.
> (even if it was going to exit, it might not have had time to. I didn't find a better example.)
>
>       0 [000] ..s.  1792.272763: timer_expire_entry: timer=ffff91ca4cde8478 function=dev_watchdog now=4295340288
>       3 [000] ..s.  1792.272766: timer_expire_exit: timer=ffff91ca4cde8478
>
> The next timer is very short, so the poll_state.c loop does not exit.
>
>       0 [000] d.s.  1792.272767: timer_expire_entry: timer=ffffffffc0997440 function=delayed_work_timer_fn now=4295340288
>       5 [000] dNs.  1792.272772: timer_expire_exit: timer=ffffffffc0997440
>
> This time "need_resched" is set. I assume it didn't have time to exit idle state 0 yet.
>
>       0 [000] dNs.  1792.272772: timer_expire_entry: timer=ffffffffb46faa40 function=delayed_work_timer_fn now=4295340288
>       0 [000] dNs.  1792.272773: timer_expire_exit: timer=ffffffffb46faa40
>
> Now it exits idle state 0.
>
>       7 [000] .N..  1792.272780: cpu_idle: state=4294967295 cpu_id=0
>      29 [000] d...  1792.272810: cpu_idle: state=4 cpu_id=0
>
> And properly now decides to go into idle state 4, because nothing is going to happen for awhile.
>
>   91949 [000] d...  1792.364760: cpu_idle: state=4294967295 cpu_id=0
>       3 [000] d.h.  1792.364763: local_timer_entry: vector=239
>       0 [000] d.h.  1792.364764: hrtimer_expire_entry: hrtimer=ffff91ca5f214a00 function=watchdog_timer_fn now=1792056006926

  parent reply	other threads:[~2017-11-24 17:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <a181bf42-9462-476c-6dcd-39fc7151957f@tu-dresden.de>
2017-07-27 12:50 ` [PATCH] cpuidle: Allow menu governor to enter deeper sleep states after some time Thomas Ilsche
2017-10-19  7:46   ` Len Brown
2017-10-20 16:31     ` Thomas Ilsche
2017-10-21 14:27     ` Doug Smythies
2017-10-20  0:17 ` Doug Smythies
2017-10-20 17:13   ` Thomas Ilsche
2017-10-21 14:28   ` Doug Smythies
2017-11-07 23:04     ` Thomas Ilsche
2017-11-08  4:53       ` Len Brown
2017-11-08  6:01         ` Yu Chen
2017-11-08 16:26         ` Doug Smythies
2017-11-08 16:26     ` Doug Smythies
2017-11-10 17:42     ` Doug Smythies
2017-11-14  6:12     ` Doug Smythies
2017-11-16 16:11       ` Thomas Ilsche
2017-11-16 22:47       ` Doug Smythies
2017-11-24 17:36       ` Doug Smythies [this message]
2017-12-02 12:56         ` Thomas Ilsche
2017-12-15 10:44           ` Thomas Ilsche
2017-12-15 14:23             ` Rafael J. Wysocki
2017-12-21  9:43               ` Thomas Ilsche
2017-12-22 19:37                 ` Rafael J. Wysocki
2017-12-15 16:16             ` Doug Smythies
2017-12-16  2:34               ` Rafael J. Wysocki
2017-11-25 16:30       ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000a01d3654a$c4996990$4dcc3cb0$@net' \
    --to=dsmythies@telus.net \
    --cc=alex.shi@linaro.org \
    --cc=daniel.hackenberg@tu-dresden.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=lenb@kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mario.bielert@tu-dresden.de \
    --cc=mhaehnel@os.inf.tu-dresden.de \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=robert.schoene@tu-dresden.de \
    --cc=thomas.ilsche@tu-dresden.de \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).