public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	John Stultz <john.stultz@linaro.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Len Brown <len.brown@intel.com>,
	Rafael Wysocki <rafael.j.wysocki@intel.com>,
	Eduardo Valentin <edubezval@gmail.com>,
	Paul Turner <pjt@google.com>,
	jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH 3/4] sched: introduce synchronized idle injection
Date: Mon, 11 Jan 2016 13:50:05 -0800	[thread overview]
Message-ID: <20160111135005.49f38a45@icelake> (raw)
In-Reply-To: <alpine.DEB.2.11.1511201418580.3989@nanos>

On Fri, 20 Nov 2015 19:53:45 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> > This all makes the idle-injection muck hard depend on hres_active,
> > but I think that's a sane constraint anyway.  
> 
> It makes it actually depend on NO_HZ || HRES. It comes with a few
> other restrictions as well: I'm not going to support that for TSCs
> which stop in C3. It's ugly enough already and I really don't want to
> muck with the broadcast device.
> 
> One other thing is that the caller has to ensure that the unblock()
> happens in a timely manner. I really would recommend to just disable
> preemption around that machine on fire wait loop and screw the RT
> tasks. We screw them anyway when they depend on a timer.
> 
> Untested patch below.
> 
Sorry it took a while, but I tested the patch below with some small
tweaks. Here are the testing results and issues.

1. Block tick works well to improve idle efficiency, e.g. on an Ivy
Bridge 4 core 8 thread desktop system. Performance per watt is
better than without blocking the ticks. It is also more
consistent and predictable. For CPU bound floating point spin workload,
compare idle percentage from 0-50%. The test below is limited to
package C7. The improvement is likely better on newer CPUs or system
tuned with deeper package C-states.

W/O patch
==========
IdlePct    Perf    RAPL    WallPower PPW_RAPL
0 467.16 29.58 0.0 15.79
5 441.68 27.94 0.0 15.81
10 406.71 26.50 0.0 15.35
15 372.16 24.94 0.0 14.92
20 340.89 23.41 0.0 14.56
25 307.87 21.97 0.0 14.01
30 310.69 21.39 0.0 14.53
35 293.29 21.00 0.0 13.97
40 268.78 19.90 0.0 13.51
45 240.66 18.73 0.0 12.85
50 238.56 18.74 0.0 12.73

W/ patch
IdlePct    Perf    RAPL    WallPower PPW_RAPL
0 467.19 29.71 0.0 15.73
5 445.07 28.53 0.0 15.6
10 420.34 27.43 0.0 15.32
15 398.47 26.25 0.0 15.18
20 376.47 25.05 0.0 15.03
25 354.13 23.81 0.0 14.87
30 329.50 22.57 0.0 14.6
35 304.38 21.27 0.0 14.31
40 290.42 20.21 0.0 14.37
45 254.82 16.70 0.0 15.26
50 236.10 17.75 0.0 13.3

Note: PPW_RAPL= performance per watt measured by RAPL package power
meter.

If we just look at 20% forced idle, I can see significantly more deep
package C-states(PC7) residency.
		Pkg%pc2 	Pkg%pc6 	Pkg%pc7 
w/o patch	3.17		4.39   		13.21 
w/ patch	1.15   		1.43            17.24 

2. run into warning
[  109.035699] WARNING: CPU: 0 PID: 2748 at
kernel/time/clockevents.c:325 clockevents_program_event+0x102/0x110() 
[  109.035700] Current state: 0
I think it is caused by tick_throttler in CLOCK_EVT_STATE_DETACHED
state. perhaps fix it with below?
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1141,6 +1141,7 @@ static struct clock_event_device tick_throttler =
{
        .features       = CLOCK_EVT_FEAT_ONESHOT,
        .event_handler  = tick_throttling_handler,
        .set_next_event = tick_throttling_noop,
+       .state_use_accessors = CLOCK_EVT_STATE_ONESHOT,
        .mult           = 1,
        .shift          = 0,
        .irq            = -1,

3. typo? since we use ns as delay unit
@@ -1175,7 +1176,7 @@ int tick_nohz_block_tick(u64 delay)
        if (WARN_ON_ONCE(td->evtdev == &tick_throttler))
                goto out;
 
-       if (delay > td->evtdev->max_delta_ticks) {
+       if (delay > td->evtdev->max_delta_ns) 


4. I see random message about "sched: RT throttling activated", not
sure accounting is being affected. I never got close to 95% RT.

5. Random soft lockup, stall, detected by NMI watchdog, or RCU preempt,
still debugging but each idle duration is only 5 ticks. I also disabled
preemption as suggested.

Thanks,

Jacob

> Thanks,
> 
> 	tglx
> 
> 8<----------------
> Subject: tick/nohz: Allow blocking the tick to prevent fire
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 20 Nov 2015 14:28:17 +0100
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/tick.h     |    5 ++
>  kernel/time/Kconfig      |    5 ++
>  kernel/time/tick-sched.c |   99
> +++++++++++++++++++++++++++++++++++++++++++++++
> kernel/time/tick-sched.h |    7 ++- 4 files changed, 114
> insertions(+), 2 deletions(-)
> 
> Index: tip/include/linux/tick.h
> ===================================================================
> --- tip.orig/include/linux/tick.h
> +++ tip/include/linux/tick.h
> @@ -203,4 +203,9 @@ static inline void tick_nohz_task_switch
>  		__tick_nohz_task_switch();
>  }
>  
> +#ifdef CONFIG_TICK_THROTTLING
> +int tick_nohz_block_tick(u64 delay);
> +void tick_nohz_unblock_tick(void);
> +#endif
> +
>  #endif
> Index: tip/kernel/time/Kconfig
> ===================================================================
> --- tip.orig/kernel/time/Kconfig
> +++ tip/kernel/time/Kconfig
> @@ -60,6 +60,11 @@ menu "Timers subsystem"
>  config TICK_ONESHOT
>  	bool
>  
> +# Special functions for tick throttling to avoid fire extinguishers
> +config TICK_THROTTLING
> +       bool
> +       depends on TICK_ONESHOT
> +
>  config NO_HZ_COMMON
>  	bool
>  	depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
> Index: tip/kernel/time/tick-sched.c
> ===================================================================
> --- tip.orig/kernel/time/tick-sched.c
> +++ tip/kernel/time/tick-sched.c
> @@ -1119,6 +1119,105 @@ void tick_setup_sched_timer(void)
>  }
>  #endif /* HIGH_RES_TIMERS */
>  
> +#ifdef CONFIG_TICK_THROTTLING
> +/*
> + * An interrupt might have been pending already, when we programmed
> + * the throttler time. Nothing to do here. The device is armed and
> the
> + * interrupt will fire again. If this is the real wakeup event then
> + * the unblock call will retrigger it.
> + */
> +static void tick_throttling_handler(struct clock_event_device *dev)
> +{
> +}
> +
> +static int tick_throttling_noop(unsigned long evt,
> +				struct clock_event_device *d)
> +{
> +	return 0;
> +}
> +
> +static struct clock_event_device tick_throttler = {
> +	.name		= "throttler",
> +	.features	= CLOCK_EVT_FEAT_ONESHOT,
> +	.event_handler	= tick_throttling_handler,
> +	.set_next_event	= tick_throttling_noop,
> +	.mult		= 1,
> +	.shift		= 0,
> +	.irq		= -1,
> +};
> +
> +/**
> + * tick_nohz_block_tick - Force block the tick to prevent fire
> + * @delay:	Defer the tick for X nano seconds
> + *
> + * This is a special interface for thermal emergencies. Do not use
> + * otherwise!  Make sure to call tick_nohz_block_tick() right after
> + * the delay ends to undo the damage.
> + */
> +int tick_nohz_block_tick(u64 delay)
> +{
> +	struct tick_device *td;
> +	unsigned long flags;
> +	int ret = -EBUSY;
> +	ktime_t until;
> +
> +	if (!tick_nohz_active)
> +		return -EBUSY;
> +
> +	local_irq_save(flags);
> +	td = this_cpu_ptr(&tick_cpu_device);
> +
> +	/* No way to do that with broadcast */
> +	if (td->evtdev->features & CLOCK_EVT_FEAT_C3STOP)
> +		goto out;
> +
> +	/* Yes, I do not trust people! */
> +	if (WARN_ON_ONCE(td->evtdev == &tick_throttler))
> +		goto out;
> +
> +	if (delay > td->evtdev->max_delta_ticks) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	until = ktime_add_ns(ktime_get(), delay);
> +	if (!tick_program_event(until, 0)) {
> +		td->real_evtdev = td->evtdev;
> +		td->evtdev = &tick_throttler;
> +		ret = 0;
> +	}
> +out:
> +	local_irq_restore(flags);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tick_nohz_block_tick);
> +
> +/**
> + * tick_nohz_force_unblock_tick - Undo the force blocking of the tick
> + *
> + * Pairs with tick_nohz_block_tick(). Can be called unconditionally
> + * even if the tick was not blocked by tick_nohz_block_tick().
> + */
> +void tick_nohz_unblock_tick(void)
> +{
> +	struct tick_device *td;
> +	unsigned long flags;
> +
> +	if (!tick_nohz_active)
> +		return;
> +
> +	local_irq_save(flags);
> +	td = this_cpu_ptr(&tick_cpu_device);
> +	if (td->evtdev == &tick_throttler) {
> +		td->evtdev = td->real_evtdev;
> +		/* Force a timer interrupt now */
> +		tick_program_event(ktime_get(), 1);
> +	}
> +	local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(tick_nohz_unblock_tick);
> +#endif /* TICK_THROTTLING */
> +
>  #if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
>  void tick_cancel_sched_timer(int cpu)
>  {
> Index: tip/kernel/time/tick-sched.h
> ===================================================================
> --- tip.orig/kernel/time/tick-sched.h
> +++ tip/kernel/time/tick-sched.h
> @@ -9,8 +9,11 @@ enum tick_device_mode {
>  };
>  
>  struct tick_device {
> -	struct clock_event_device *evtdev;
> -	enum tick_device_mode mode;
> +	struct clock_event_device	*evtdev;
> +	enum tick_device_mode		mode;
> +#ifdef CONFIG_TICK_THROTTLING
> +	struct clock_event_device	*real_evtdev;
> +#endif
>  };
>  
>  enum tick_nohz_mode {

[Jacob Pan]

  reply	other threads:[~2016-01-11 21:51 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-13 19:53 [PATCH 0/4] CFS idle injection Jacob Pan
2015-11-13 19:53 ` [PATCH 1/4] ktime: add a roundup function Jacob Pan
2015-11-13 20:11   ` John Stultz
2015-11-13 22:33     ` Jacob Pan
2015-11-13 20:13   ` Thomas Gleixner
2015-11-13 22:36     ` Jacob Pan
2015-11-13 19:53 ` [PATCH 2/4] timer: relax tick stop in idle entry Jacob Pan
2015-11-13 20:22   ` Thomas Gleixner
2015-11-13 22:24     ` Jacob Pan
2015-11-16 15:06       ` Thomas Gleixner
2015-11-16 21:51         ` Jacob Pan
2015-11-16 22:01           ` Thomas Gleixner
2015-11-17  0:09             ` Jacob Pan
2015-11-19 17:43               ` Jacob Pan
2015-11-19 19:06               ` Peter Zijlstra
2015-11-19 19:21                 ` Jacob Pan
2015-11-19 19:59                   ` Peter Zijlstra
2015-11-19 23:41                     ` Jacob Pan
2015-11-16 22:31           ` Paul E. McKenney
2015-11-16 23:05             ` Jacob Pan
2015-11-16 23:15             ` Jacob Pan
2015-11-16 23:28               ` Paul E. McKenney
2015-11-16 23:32                 ` Arjan van de Ven
2015-11-16 23:40                   ` Jacob Pan
2015-11-17  0:00                     ` Paul E. McKenney
2015-11-16 22:32           ` Josh Triplett
2015-11-16 23:26             ` Paul E. McKenney
2015-11-17  1:41               ` Josh Triplett
2015-11-17  2:53                 ` Paul E. McKenney
2015-11-17  2:57                   ` Arjan van de Ven
2015-11-17  5:04                     ` Paul E. McKenney
2015-11-17 10:24                       ` Peter Zijlstra
2015-11-17 12:57                         ` Jacob Pan
2015-11-17 13:49                           ` Paul E. McKenney
2015-11-13 19:53 ` [PATCH 3/4] sched: introduce synchronized idle injection Jacob Pan
2015-11-13 20:23   ` kbuild test robot
2015-11-18  8:36   ` Ingo Molnar
2015-11-18 10:35     ` Peter Zijlstra
2015-11-18 12:27       ` Morten Rasmussen
2015-11-18 12:49         ` Peter Zijlstra
2015-11-18 14:04           ` Morten Rasmussen
2015-11-18 14:52             ` Jacob Pan
2015-11-18 15:09               ` Morten Rasmussen
2015-11-18 15:11                 ` Jacob Pan
2015-11-18 15:21                   ` Thomas Gleixner
2015-11-18 17:03                     ` Jacob Pan
2015-11-18 16:04                   ` Morten Rasmussen
2015-11-27  9:17       ` Ingo Molnar
2015-11-18 14:10     ` Jacob Pan
2015-11-27  9:17       ` Ingo Molnar
2015-12-02 17:28         ` Jacob Pan
2015-11-18 14:19     ` Arjan van de Ven
2015-11-18 15:44   ` Morten Rasmussen
2015-11-18 15:51     ` Arjan van de Ven
2015-11-19 17:24       ` Morten Rasmussen
2015-11-19 20:09         ` Peter Zijlstra
2015-11-20  9:45           ` Thomas Gleixner
2015-11-20 10:20             ` Peter Zijlstra
2015-11-20 10:58               ` Thomas Gleixner
2015-11-20 12:54                 ` Peter Zijlstra
2015-11-20 18:53                   ` Thomas Gleixner
2016-01-11 21:50                     ` Jacob Pan [this message]
2016-01-19 12:06                       ` Ingo Molnar
2016-01-19 18:00                         ` Jacob Pan
2015-11-24 11:38                 ` Jacob Pan
2015-11-23 17:59         ` Jacob Pan
2015-11-23 17:56   ` Javi Merino
2015-11-23 18:07     ` Peter Zijlstra
2015-11-24  9:12       ` Javi Merino
2015-11-24 10:09         ` Peter Zijlstra
2015-11-24 11:10           ` Jacob Pan
2015-11-24 12:00             ` Javi Merino
2015-11-24 18:22               ` Jacob Pan
2015-11-25  9:41                 ` Javi Merino
2015-11-13 19:53 ` [PATCH 4/4] sched: add trace event for " Jacob Pan
2015-11-13 20:10   ` kbuild test robot
2015-11-19 14:39   ` Javi Merino
2015-11-19 15:35     ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160111135005.49f38a45@icelake \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=arjan@linux.intel.com \
    --cc=edubezval@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox