[PATCH] sched/rt: don't try to balance rt

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
@ 2014-05-14 15:08 Paul Gortmaker
  2014-05-14 15:44 ` Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Paul Gortmaker @ 2014-05-14 15:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-rt-users, Paul Gortmaker, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Paul E. McKenney

As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
("sched: rt-group: smp balancing") the concept of borrowing per
cpu rt_runtime from one core to another was introduced.

However, this prevents the RT throttling message from ever being
emitted when someone does a common (but mistaken) attempt at
using too much CPU in RT context.  Consider the following test:

  echo "main() {for(;;);}" > full_load.c
  gcc full_load.c -o full_load
  taskset -c 1 ./full_load &
  chrt -r -p 80 `pidof full_load`

When run on x86_64 defconfig, what happens is as follows:

-task runs on core1 for 95% of an rt_period as documented in
 the file Documentation/scheduler/sched-rt-group.txt

-at 95%, the code in balance_runtime sees this threshold and
 calls do_balance_runtime()

-do_balance_runtime sees that core 1 is in need, and does this:
	---------------
        if (rt_rq->rt_runtime + diff > rt_period)
                diff = rt_period - rt_rq->rt_runtime;
        iter->rt_runtime -= diff;
        rt_rq->rt_runtime += diff;
	---------------
 which extends core1's rt_runtime by 5%, making it 100% of rt_period
 by stealing 5% from core0 (or possibly some other core).

However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
we hit this near the top of that function:
	---------------
        if (runtime >= sched_rt_period(rt_rq))
                return 0;
	---------------
and hence we'll _never_ look at/set any of the throttling checks and
messages in sched_rt_runtime_exceeded().  Instead, we will happily
plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
the RCU subsystem will get angry and trigger an NMI in response to
what it rightly sees as a WTF situation.

Granted, there are lots of ways you can do bad things to yourself with
RT, but in the current zeitgeist of multicore systems with people
dedicating individual cores to individual tasks, I'd say the above is
common enough that we should react to it sensibly, and an RCU stall
really doesn't translate well to an end user vs a simple message that
says "throttling activated".

One way to get the throttle message instead of the ambiguous and lengthy
NMI triggered all core backtrace of the RCU stall is to change the
SCHED_FEAT(RT_RUNTIME_SHARE, true) to false.  One could make a good
case for this being the default for the out-of-tree preempt-rt series,
since folks using that are more apt to be manually tuning the system
and won't want an invisible hand coming in and making changes.

However, in mainline, where it is more likely that there will be
n+x (x>0) RT tasks on an n core system, we can leave the sharing on,
and still avoid the RCU stalls by noting that there is no point in
trying to balance when there are no tasks to migrate, or only a
single RT task is present.  Inflating the rt_runtime does nothing
in this case other than defeat sched_rt_runtime_exceeded().

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---

[I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338
 and tglx asked why they wouldn't see the throttle message; it is only
 now that I had a chance to dig in and figure out why.  Oh, and the patch
 is against linux-next, in case that matters...]

 kernel/sched/rt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ea4d500..698aac9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
 	if (!sched_feat(RT_RUNTIME_SHARE))
 		return more;

+	/*
+	 * Stealing from another core won't help us at all if
+	 * we have nothing to migrate over there, or only one
+	 * task that is running up all the rt_time.  In fact it
+	 * will just inhibit the throttling message in that case.
+	 */
+	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
+		return more;
+
 	if (rt_rq->rt_time > rt_rq->rt_runtime) {
 		raw_spin_unlock(&rt_rq->rt_runtime_lock);
 		more = do_balance_runtime(rt_rq);
-- 
1.8.2.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker
@ 2014-05-14 15:44 ` Paul E. McKenney
  2014-05-14 19:11   ` Paul Gortmaker
  2014-05-15  3:18   ` Mike Galbraith
  2014-05-19 12:40 ` Peter Zijlstra
  2014-11-27 11:21 ` Wanpeng Li
  2 siblings, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-14 15:44 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner

On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> ("sched: rt-group: smp balancing") the concept of borrowing per
> cpu rt_runtime from one core to another was introduced.
> 
> However, this prevents the RT throttling message from ever being
> emitted when someone does a common (but mistaken) attempt at
> using too much CPU in RT context.  Consider the following test:
> 
>   echo "main() {for(;;);}" > full_load.c
>   gcc full_load.c -o full_load
>   taskset -c 1 ./full_load &
>   chrt -r -p 80 `pidof full_load`
> 
> When run on x86_64 defconfig, what happens is as follows:
> 
> -task runs on core1 for 95% of an rt_period as documented in
>  the file Documentation/scheduler/sched-rt-group.txt
> 
> -at 95%, the code in balance_runtime sees this threshold and
>  calls do_balance_runtime()
> 
> -do_balance_runtime sees that core 1 is in need, and does this:
> 	---------------
>         if (rt_rq->rt_runtime + diff > rt_period)
>                 diff = rt_period - rt_rq->rt_runtime;
>         iter->rt_runtime -= diff;
>         rt_rq->rt_runtime += diff;
> 	---------------
>  which extends core1's rt_runtime by 5%, making it 100% of rt_period
>  by stealing 5% from core0 (or possibly some other core).
> 
> However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
> we hit this near the top of that function:
> 	---------------
>         if (runtime >= sched_rt_period(rt_rq))
>                 return 0;
> 	---------------
> and hence we'll _never_ look at/set any of the throttling checks and
> messages in sched_rt_runtime_exceeded().  Instead, we will happily
> plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
> the RCU subsystem will get angry and trigger an NMI in response to
> what it rightly sees as a WTF situation.

In theory, one way of making RCU OK with an RT usermode CPU hog is to
build with Frederic's CONFIG_NO_HZ_FULL=y.  This will cause RCU to see
CPUs having a single runnable usermode task as idle, preventing the RCU
CPU stall warning.  This does work well for mainline kernel in the lab.

In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
for -rt kernels in production environments.

But leaving practice aside for the moment...

> Granted, there are lots of ways you can do bad things to yourself with
> RT, but in the current zeitgeist of multicore systems with people
> dedicating individual cores to individual tasks, I'd say the above is
> common enough that we should react to it sensibly, and an RCU stall
> really doesn't translate well to an end user vs a simple message that
> says "throttling activated".
> 
> One way to get the throttle message instead of the ambiguous and lengthy
> NMI triggered all core backtrace of the RCU stall is to change the
> SCHED_FEAT(RT_RUNTIME_SHARE, true) to false.  One could make a good
> case for this being the default for the out-of-tree preempt-rt series,
> since folks using that are more apt to be manually tuning the system
> and won't want an invisible hand coming in and making changes.
> 
> However, in mainline, where it is more likely that there will be
> n+x (x>0) RT tasks on an n core system, we can leave the sharing on,
> and still avoid the RCU stalls by noting that there is no point in
> trying to balance when there are no tasks to migrate, or only a
> single RT task is present.  Inflating the rt_runtime does nothing
> in this case other than defeat sched_rt_runtime_exceeded().
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> 
> [I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338
>  and tglx asked why they wouldn't see the throttle message; it is only
>  now that I had a chance to dig in and figure out why.  Oh, and the patch
>  is against linux-next, in case that matters...]
> 
>  kernel/sched/rt.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index ea4d500..698aac9 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
>  	if (!sched_feat(RT_RUNTIME_SHARE))
>  		return more;
> 
> +	/*
> +	 * Stealing from another core won't help us at all if
> +	 * we have nothing to migrate over there, or only one
> +	 * task that is running up all the rt_time.  In fact it
> +	 * will just inhibit the throttling message in that case.
> +	 */
> +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)

How about something like the following to take NO_HZ_FULL into account?

+	if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) &&
+	    !tick_nohz_full_cpu(cpu))

							Thanx, Paul

> +		return more;
> +
>  	if (rt_rq->rt_time > rt_rq->rt_runtime) {
>  		raw_spin_unlock(&rt_rq->rt_runtime_lock);
>  		more = do_balance_runtime(rt_rq);
> -- 
> 1.8.2.3
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 15:44 ` Paul E. McKenney
@ 2014-05-14 19:11   ` Paul Gortmaker
  2014-05-14 19:27     ` Paul E. McKenney
                       ` (2 more replies)
  2014-05-15  3:18   ` Mike Galbraith
  1 sibling, 3 replies; 31+ messages in thread
From: Paul Gortmaker @ 2014-05-14 19:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

[Added Frederic to Cc: since we are now talking nohz stuff]

[Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote:

> On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
> > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> > ("sched: rt-group: smp balancing") the concept of borrowing per
> > cpu rt_runtime from one core to another was introduced.
> > 
> > However, this prevents the RT throttling message from ever being
> > emitted when someone does a common (but mistaken) attempt at
> > using too much CPU in RT context.  Consider the following test:
> > 
> >   echo "main() {for(;;);}" > full_load.c
> >   gcc full_load.c -o full_load
> >   taskset -c 1 ./full_load &
> >   chrt -r -p 80 `pidof full_load`
> > 
> > When run on x86_64 defconfig, what happens is as follows:
> > 
> > -task runs on core1 for 95% of an rt_period as documented in
> >  the file Documentation/scheduler/sched-rt-group.txt
> > 
> > -at 95%, the code in balance_runtime sees this threshold and
> >  calls do_balance_runtime()
> > 
> > -do_balance_runtime sees that core 1 is in need, and does this:
> > 	---------------
> >         if (rt_rq->rt_runtime + diff > rt_period)
> >                 diff = rt_period - rt_rq->rt_runtime;
> >         iter->rt_runtime -= diff;
> >         rt_rq->rt_runtime += diff;
> > 	---------------
> >  which extends core1's rt_runtime by 5%, making it 100% of rt_period
> >  by stealing 5% from core0 (or possibly some other core).
> > 
> > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
> > we hit this near the top of that function:
> > 	---------------
> >         if (runtime >= sched_rt_period(rt_rq))
> >                 return 0;
> > 	---------------
> > and hence we'll _never_ look at/set any of the throttling checks and
> > messages in sched_rt_runtime_exceeded().  Instead, we will happily
> > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
> > the RCU subsystem will get angry and trigger an NMI in response to
> > what it rightly sees as a WTF situation.
> 
> In theory, one way of making RCU OK with an RT usermode CPU hog is to
> build with Frederic's CONFIG_NO_HZ_FULL=y.  This will cause RCU to see
> CPUs having a single runnable usermode task as idle, preventing the RCU
> CPU stall warning.  This does work well for mainline kernel in the lab.

Agreed; wanting to test that locally for myself meant moving to a more
modern machine, as the older PentiumD doesn't support NO_HZ_FULL.  But
on the newer box (dual socket six cores in each) I found the stall
harder to trigger w/o going back to using the threadirqs boot arg as
used in the earlier lkml post referenced below. (Why?  Not sure...)

Once I did that though (boot vanilla linux-next with threadirqs) I
confirmed what you said; i.e. that we would reliably get a stall with
the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also
RCU_USER_QS=y).

> 
> In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> for -rt kernels in production environments.
> 
> But leaving practice aside for the moment...
> 

[...]

> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index ea4d500..698aac9 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
> >  	if (!sched_feat(RT_RUNTIME_SHARE))
> >  		return more;
> > 
> > +	/*
> > +	 * Stealing from another core won't help us at all if
> > +	 * we have nothing to migrate over there, or only one
> > +	 * task that is running up all the rt_time.  In fact it
> > +	 * will just inhibit the throttling message in that case.
> > +	 */
> > +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
> 
> How about something like the following to take NO_HZ_FULL into account?
> 
> +	if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) &&
> +	    !tick_nohz_full_cpu(cpu))

Yes, I think special casing nohz_full can make sense, but maybe not
exactly here in balance_runtime?  Since the underlying reasoning doesn't
change on nohz_full ; if only one task is present, or nothing can
migrate, then the call to do_balance_runtime is largely useless - we'll
walk possibly all cpus in search of an rt_rq to steal from, and what we
steal, we can't use - so we've artificially crippled the other rt_rq for
nothing other than to artifically inflate our rt_runtime and thus allow
100% usage.

Given that, perhaps a separate change to sched_rt_runtime_exceeded()
that works out the CPU from the rt_rq, and returns zero if it is a
nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
get the throttling message even if they go 100%.

Paul.
--

> 
> 							Thanx, Paul
> 
> > +		return more;
> > +
> >  	if (rt_rq->rt_time > rt_rq->rt_runtime) {
> >  		raw_spin_unlock(&rt_rq->rt_runtime_lock);
> >  		more = do_balance_runtime(rt_rq);
> > -- 
> > 1.8.2.3
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 19:11   ` Paul Gortmaker
@ 2014-05-14 19:27     ` Paul E. McKenney
  2014-05-15  2:49     ` Mike Galbraith
  2014-11-27 11:36     ` Wanpeng Li
  2 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-14 19:27 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Wed, May 14, 2014 at 03:11:00PM -0400, Paul Gortmaker wrote:
> [Added Frederic to Cc: since we are now talking nohz stuff]
> 
> [Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote:
> 
> > On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
> > > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> > > ("sched: rt-group: smp balancing") the concept of borrowing per
> > > cpu rt_runtime from one core to another was introduced.
> > > 
> > > However, this prevents the RT throttling message from ever being
> > > emitted when someone does a common (but mistaken) attempt at
> > > using too much CPU in RT context.  Consider the following test:
> > > 
> > >   echo "main() {for(;;);}" > full_load.c
> > >   gcc full_load.c -o full_load
> > >   taskset -c 1 ./full_load &
> > >   chrt -r -p 80 `pidof full_load`
> > > 
> > > When run on x86_64 defconfig, what happens is as follows:
> > > 
> > > -task runs on core1 for 95% of an rt_period as documented in
> > >  the file Documentation/scheduler/sched-rt-group.txt
> > > 
> > > -at 95%, the code in balance_runtime sees this threshold and
> > >  calls do_balance_runtime()
> > > 
> > > -do_balance_runtime sees that core 1 is in need, and does this:
> > > 	---------------
> > >         if (rt_rq->rt_runtime + diff > rt_period)
> > >                 diff = rt_period - rt_rq->rt_runtime;
> > >         iter->rt_runtime -= diff;
> > >         rt_rq->rt_runtime += diff;
> > > 	---------------
> > >  which extends core1's rt_runtime by 5%, making it 100% of rt_period
> > >  by stealing 5% from core0 (or possibly some other core).
> > > 
> > > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
> > > we hit this near the top of that function:
> > > 	---------------
> > >         if (runtime >= sched_rt_period(rt_rq))
> > >                 return 0;
> > > 	---------------
> > > and hence we'll _never_ look at/set any of the throttling checks and
> > > messages in sched_rt_runtime_exceeded().  Instead, we will happily
> > > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
> > > the RCU subsystem will get angry and trigger an NMI in response to
> > > what it rightly sees as a WTF situation.
> > 
> > In theory, one way of making RCU OK with an RT usermode CPU hog is to
> > build with Frederic's CONFIG_NO_HZ_FULL=y.  This will cause RCU to see
> > CPUs having a single runnable usermode task as idle, preventing the RCU
> > CPU stall warning.  This does work well for mainline kernel in the lab.
> 
> Agreed; wanting to test that locally for myself meant moving to a more
> modern machine, as the older PentiumD doesn't support NO_HZ_FULL.  But
> on the newer box (dual socket six cores in each) I found the stall
> harder to trigger w/o going back to using the threadirqs boot arg as
> used in the earlier lkml post referenced below. (Why?  Not sure...)
> 
> Once I did that though (boot vanilla linux-next with threadirqs) I
> confirmed what you said; i.e. that we would reliably get a stall with
> the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also
> RCU_USER_QS=y).

Nice!!!  Thank you for checking this out!

> > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > for -rt kernels in production environments.
> > 
> > But leaving practice aside for the moment...
> > 
> 
> [...]
> 
> > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > > index ea4d500..698aac9 100644
> > > --- a/kernel/sched/rt.c
> > > +++ b/kernel/sched/rt.c
> > > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
> > >  	if (!sched_feat(RT_RUNTIME_SHARE))
> > >  		return more;
> > > 
> > > +	/*
> > > +	 * Stealing from another core won't help us at all if
> > > +	 * we have nothing to migrate over there, or only one
> > > +	 * task that is running up all the rt_time.  In fact it
> > > +	 * will just inhibit the throttling message in that case.
> > > +	 */
> > > +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
> > 
> > How about something like the following to take NO_HZ_FULL into account?
> > 
> > +	if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) &&
> > +	    !tick_nohz_full_cpu(cpu))
> 
> Yes, I think special casing nohz_full can make sense, but maybe not
> exactly here in balance_runtime?  Since the underlying reasoning doesn't
> change on nohz_full ; if only one task is present, or nothing can
> migrate, then the call to do_balance_runtime is largely useless - we'll
> walk possibly all cpus in search of an rt_rq to steal from, and what we
> steal, we can't use - so we've artificially crippled the other rt_rq for
> nothing other than to artifically inflate our rt_runtime and thus allow
> 100% usage.
> 
> Given that, perhaps a separate change to sched_rt_runtime_exceeded()
> that works out the CPU from the rt_rq, and returns zero if it is a
> nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
> get the throttling message even if they go 100%.

Makes sense to me!  Then again, I am no scheduler expert.

							Thanx, Paul

> Paul.
> --
> 
> > 
> > 							Thanx, Paul
> > 
> > > +		return more;
> > > +
> > >  	if (rt_rq->rt_time > rt_rq->rt_runtime) {
> > >  		raw_spin_unlock(&rt_rq->rt_runtime_lock);
> > >  		more = do_balance_runtime(rt_rq);
> > > -- 
> > > 1.8.2.3
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 19:11   ` Paul Gortmaker
  2014-05-14 19:27     ` Paul E. McKenney
@ 2014-05-15  2:49     ` Mike Galbraith
  2014-05-15 14:09       ` Paul Gortmaker
  2014-11-27  9:17       ` Wanpeng Li
  2014-11-27 11:36     ` Wanpeng Li
  2 siblings, 2 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-05-15  2:49 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar,
	Frederic Weisbecker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote:

> Given that, perhaps a separate change to sched_rt_runtime_exceeded()
> that works out the CPU from the rt_rq, and returns zero if it is a
> nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
> get the throttling message even if they go 100%.

I don't get it.  What reason would there be to run a hog on a dedicated
core as realtime policy/priority?  Given no competition, there's nothing
to prioritize, you could just as well run a critical task as SCHED_IDLE.

I would also expect that anyone wanting bare metal will have all of
their critical cores isolated from the scheduler, watchdogs turned off
as well as that noisy throttle, the whole point being to make as much
silent as possible.  Seems to me tick_nohz_full_cpu(cpu) should be
predicated by that cpu being isolated from the #1 noise source, the
scheduler and its load balancing.  There's just no point to nohz_full
without that, or if there is, I sure don't see it.

When I see people trying to run a hog as a realtime task, it's because
they are trying in vain to keep competition away from precious cores..
and one mlockall with a realtime hog blocking flush_work() gives them a
wakeup call.

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15  2:49     ` Mike Galbraith
@ 2014-05-15 14:09       ` Paul Gortmaker
  2014-11-27  9:17       ` Wanpeng Li
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Gortmaker @ 2014-05-15 14:09 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar,
	Frederic Weisbecker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

On 14-05-14 10:49 PM, Mike Galbraith wrote:
> On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote:
> 
>> Given that, perhaps a separate change to sched_rt_runtime_exceeded()
>> that works out the CPU from the rt_rq, and returns zero if it is a
>> nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
>> get the throttling message even if they go 100%.
> 
> I don't get it.  What reason would there be to run a hog on a dedicated
> core as realtime policy/priority?  Given no competition, there's nothing
> to prioritize, you could just as well run a critical task as SCHED_IDLE.

Well, as per the original commit log, we acknowledge that people will do
stupid things that don't make 100% sense, and when they do, we should
ideally behave in a sane fashion in response to that.  And I don't think
that "no competition" is a given for most folks.  They see all these
internal threads running and just figure they can chrt their way to a
solution, vs. taking the time to clean up, enable RCU_NOCB etc etc.
Don't get me wrong; I'm not defending such behaviour...

> 
> I would also expect that anyone wanting bare metal will have all of
> their critical cores isolated from the scheduler, watchdogs turned off
> as well as that noisy throttle, the whole point being to make as much
> silent as possible.  Seems to me tick_nohz_full_cpu(cpu) should be
> predicated by that cpu being isolated from the #1 noise source, the
> scheduler and its load balancing.  There's just no point to nohz_full
> without that, or if there is, I sure don't see it.

An interesting point.  One could argue that the default for the nohz_full
cores should be to be isolated from the scheduler, vs needing to be
manually excluded.

P.
--

> 
> When I see people trying to run a hog as a realtime task, it's because
> they are trying in vain to keep competition away from precious cores..
> and one mlockall with a realtime hog blocking flush_work() gives them a
> wakeup call.
> 
> -Mike
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15  2:49     ` Mike Galbraith
  2014-05-15 14:09       ` Paul Gortmaker
@ 2014-11-27  9:17       ` Wanpeng Li
  2014-11-27 15:31         ` Mike Galbraith
  1 sibling, 1 reply; 31+ messages in thread
From: Wanpeng Li @ 2014-11-27  9:17 UTC (permalink / raw)
  To: Mike Galbraith, Paul Gortmaker
  Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar,
	Frederic Weisbecker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

Hi Mike,
On 5/15/14, 10:49 AM, Mike Galbraith wrote:
> On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote:
>
>> Given that, perhaps a separate change to sched_rt_runtime_exceeded()
>> that works out the CPU from the rt_rq, and returns zero if it is a
>> nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
>> get the throttling message even if they go 100%.
> I don't get it.  What reason would there be to run a hog on a dedicated
> core as realtime policy/priority?  Given no competition, there's nothing
> to prioritize, you could just as well run a critical task as SCHED_IDLE.
>
> I would also expect that anyone wanting bare metal will have all of
> their critical cores isolated from the scheduler, watchdogs turned off
> as well as that noisy throttle, the whole point being to make as much
> silent as possible.  Seems to me tick_nohz_full_cpu(cpu) should be
> predicated by that cpu being isolated from the #1 noise source, the
> scheduler and its load balancing.  There's just no point to nohz_full
> without that, or if there is, I sure don't see it.

If the tick is still need to be handled if cpu is isolated w/o nohz full 
enabled?

Regards,
Wanpeng Li

>
> When I see people trying to run a hog as a realtime task, it's because
> they are trying in vain to keep competition away from precious cores..
> and one mlockall with a realtime hog blocking flush_work() gives them a
> wakeup call.
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-11-27  9:17       ` Wanpeng Li
@ 2014-11-27 15:31         ` Mike Galbraith
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-11-27 15:31 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paul Gortmaker, Paul E. McKenney, linux-kernel, linux-rt-users,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner

On Thu, 2014-11-27 at 17:17 +0800, Wanpeng Li wrote:

> If the tick is still need to be handled if cpu is isolated w/o nohz full 
> enabled?

"No" would lead to some awkward questions.

	-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 19:11   ` Paul Gortmaker
  2014-05-14 19:27     ` Paul E. McKenney
  2014-05-15  2:49     ` Mike Galbraith
@ 2014-11-27 11:36     ` Wanpeng Li
  2 siblings, 0 replies; 31+ messages in thread
From: Wanpeng Li @ 2014-11-27 11:36 UTC (permalink / raw)
  To: Paul Gortmaker, Paul E. McKenney
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Hi Paul,
On 5/15/14, 3:11 AM, Paul Gortmaker wrote:
> [Added Frederic to Cc: since we are now talking nohz stuff]
>
> [Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote:
>
>> On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
>>> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
>>> ("sched: rt-group: smp balancing") the concept of borrowing per
>>> cpu rt_runtime from one core to another was introduced.
>>>
>>> However, this prevents the RT throttling message from ever being
>>> emitted when someone does a common (but mistaken) attempt at
>>> using too much CPU in RT context.  Consider the following test:
>>>
>>>    echo "main() {for(;;);}" > full_load.c
>>>    gcc full_load.c -o full_load
>>>    taskset -c 1 ./full_load &
>>>    chrt -r -p 80 `pidof full_load`
>>>
>>> When run on x86_64 defconfig, what happens is as follows:
>>>
>>> -task runs on core1 for 95% of an rt_period as documented in
>>>   the file Documentation/scheduler/sched-rt-group.txt
>>>
>>> -at 95%, the code in balance_runtime sees this threshold and
>>>   calls do_balance_runtime()
>>>
>>> -do_balance_runtime sees that core 1 is in need, and does this:
>>> 	---------------
>>>          if (rt_rq->rt_runtime + diff > rt_period)
>>>                  diff = rt_period - rt_rq->rt_runtime;
>>>          iter->rt_runtime -= diff;
>>>          rt_rq->rt_runtime += diff;
>>> 	---------------
>>>   which extends core1's rt_runtime by 5%, making it 100% of rt_period
>>>   by stealing 5% from core0 (or possibly some other core).
>>>
>>> However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
>>> we hit this near the top of that function:
>>> 	---------------
>>>          if (runtime >= sched_rt_period(rt_rq))
>>>                  return 0;
>>> 	---------------
>>> and hence we'll _never_ look at/set any of the throttling checks and
>>> messages in sched_rt_runtime_exceeded().  Instead, we will happily
>>> plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
>>> the RCU subsystem will get angry and trigger an NMI in response to
>>> what it rightly sees as a WTF situation.
>> In theory, one way of making RCU OK with an RT usermode CPU hog is to
>> build with Frederic's CONFIG_NO_HZ_FULL=y.  This will cause RCU to see
>> CPUs having a single runnable usermode task as idle, preventing the RCU
>> CPU stall warning.  This does work well for mainline kernel in the lab.
> Agreed; wanting to test that locally for myself meant moving to a more
> modern machine, as the older PentiumD doesn't support NO_HZ_FULL.  But

Could you point out which hw feature support NO_HZ_FULL? How to check it 
through cpuid?

Regards,
Wanpeng Li

> on the newer box (dual socket six cores in each) I found the stall
> harder to trigger w/o going back to using the threadirqs boot arg as
> used in the earlier lkml post referenced below. (Why?  Not sure...)
>
> Once I did that though (boot vanilla linux-next with threadirqs) I
> confirmed what you said; i.e. that we would reliably get a stall with
> the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also
> RCU_USER_QS=y).
>
>> In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
>> for -rt kernels in production environments.
>>
>> But leaving practice aside for the moment...
>>
> [...]
>
>>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>>> index ea4d500..698aac9 100644
>>> --- a/kernel/sched/rt.c
>>> +++ b/kernel/sched/rt.c
>>> @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
>>>   	if (!sched_feat(RT_RUNTIME_SHARE))
>>>   		return more;
>>>
>>> +	/*
>>> +	 * Stealing from another core won't help us at all if
>>> +	 * we have nothing to migrate over there, or only one
>>> +	 * task that is running up all the rt_time.  In fact it
>>> +	 * will just inhibit the throttling message in that case.
>>> +	 */
>>> +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
>> How about something like the following to take NO_HZ_FULL into account?
>>
>> +	if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) &&
>> +	    !tick_nohz_full_cpu(cpu))
> Yes, I think special casing nohz_full can make sense, but maybe not
> exactly here in balance_runtime?  Since the underlying reasoning doesn't
> change on nohz_full ; if only one task is present, or nothing can
> migrate, then the call to do_balance_runtime is largely useless - we'll
> walk possibly all cpus in search of an rt_rq to steal from, and what we
> steal, we can't use - so we've artificially crippled the other rt_rq for
> nothing other than to artifically inflate our rt_runtime and thus allow
> 100% usage.
>
> Given that, perhaps a separate change to sched_rt_runtime_exceeded()
> that works out the CPU from the rt_rq, and returns zero if it is a
> nohz_full cpu?  Does that make sense?  Then the nohz_full people won't
> get the throttling message even if they go 100%.
>
> Paul.
> --
>
>> 							Thanx, Paul
>>
>>> +		return more;
>>> +
>>>   	if (rt_rq->rt_time > rt_rq->rt_runtime) {
>>>   		raw_spin_unlock(&rt_rq->rt_runtime_lock);
>>>   		more = do_balance_runtime(rt_rq);
>>> -- 
>>> 1.8.2.3
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 15:44 ` Paul E. McKenney
  2014-05-14 19:11   ` Paul Gortmaker
@ 2014-05-15  3:18   ` Mike Galbraith
  2014-05-15 14:45     ` Paul E. McKenney
                       ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-05-15  3:18 UTC (permalink / raw)
  To: paulmck
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:

> In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> for -rt kernels in production environments.

I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
all with 60 cores isolated.  I didn't have time to rummage, but it looks
like there are still bugs to squash. 

Biggest problem with CONFIG_NO_HZ_FULL is the price tag.  It just raped
fast mover performance last time I measured.

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15  3:18   ` Mike Galbraith
@ 2014-05-15 14:45     ` Paul E. McKenney
  2014-05-15 17:27       ` Mike Galbraith
  2014-05-18  4:22     ` Mike Galbraith
  2014-05-19 10:54     ` Peter Zijlstra
  2 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-15 14:45 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote:
> On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:
> 
> > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > for -rt kernels in production environments.
> 
> I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
> all with 60 cores isolated.  I didn't have time to rummage, but it looks
> like there are still bugs to squash. 
> 
> Biggest problem with CONFIG_NO_HZ_FULL is the price tag.  It just raped
> fast mover performance last time I measured.

I do have a report of the RCU grace-period kthreads (rcu_preempt,
rcu_sched, and rcu_bh) consuming excessive CPU time on large boxes,
but this is for workloads with lots of threads and context switches.

Whether relevant or not to your situation, working on it...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15 14:45     ` Paul E. McKenney
@ 2014-05-15 17:27       ` Mike Galbraith
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-05-15 17:27 UTC (permalink / raw)
  To: paulmck
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Thu, 2014-05-15 at 07:45 -0700, Paul E. McKenney wrote: 
> On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote:
> > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:
> > 
> > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > > for -rt kernels in production environments.
> > 
> > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
> > all with 60 cores isolated.  I didn't have time to rummage, but it looks
> > like there are still bugs to squash. 
> > 
> > Biggest problem with CONFIG_NO_HZ_FULL is the price tag.  It just raped
> > fast mover performance last time I measured.
> 
> I do have a report of the RCU grace-period kthreads (rcu_preempt,
> rcu_sched, and rcu_bh) consuming excessive CPU time on large boxes,
> but this is for workloads with lots of threads and context switches.
> 
> Whether relevant or not to your situation, working on it...

RCU signal was swamped by accounting.

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15  3:18   ` Mike Galbraith
  2014-05-15 14:45     ` Paul E. McKenney
@ 2014-05-18  4:22     ` Mike Galbraith
  2014-05-18  5:20       ` Paul E. McKenney
  2014-05-19 10:54     ` Peter Zijlstra
  2 siblings, 1 reply; 31+ messages in thread
From: Mike Galbraith @ 2014-05-18  4:22 UTC (permalink / raw)
  To: paulmck
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Thu, 2014-05-15 at 05:18 +0200, Mike Galbraith wrote: 
> On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:
> 
> > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > for -rt kernels in production environments.
> 
> I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
> all with 60 cores isolated.  I didn't have time to rummage, but it looks
> like there are still bugs to squash. 

I tested a bit more yesterday.  With only NO_HZ_FULL (no all), it did go
tickless.  A 10 second sample of perturbation numbers below.  Pretty
noisy, but it does work in rt.

Below that, some jitter numbers, using a simplified and imperfect model
of a high end rt video game (game over, insert 1 gold bar to continue
variety of high end) executive synchronizing a simple load on 60 cores.
Bottom line there was if user thinks booting nohz_full=set to get any
core quiescence provided by that should improve jitter for a threaded
load despite not being able to shut tick down, he's wrong.

-Mike

vogelweide:/abuild/mike/:[0]# head -180 xx|tail -60
pert/s:      500 >14.10us:        2 min:  1.90 max: 96.86 avg:  5.17 sum/s:  2586us overhead: 0.26%
pert/s:        1 >18.63us:        0 min:  3.38 max:  8.37 avg:  4.71 sum/s:     5us overhead: 0.00%
pert/s:        1 >18.54us:        0 min:  3.41 max:  8.51 avg:  4.61 sum/s:     5us overhead: 0.00%
pert/s:        1 >18.78us:        0 min:  3.45 max:  8.67 avg:  4.41 sum/s:     4us overhead: 0.00%
pert/s:        1 >19.60us:        0 min:  2.68 max:  7.47 avg:  4.30 sum/s:     4us overhead: 0.00%
pert/s:        1 >20.61us:        0 min:  2.60 max:  7.54 avg:  3.94 sum/s:     4us overhead: 0.00%
pert/s:        1 >21.03us:        0 min:  2.70 max:  7.81 avg:  3.91 sum/s:     4us overhead: 0.00%
pert/s:        1 >19.89us:        0 min:  2.65 max:  7.83 avg:  3.98 sum/s:     4us overhead: 0.00%
pert/s:        1 >29.36us:        0 min:  3.78 max:  9.82 avg:  6.10 sum/s:     6us overhead: 0.00%
pert/s:        1 >28.86us:        0 min:  4.56 max: 10.12 avg:  6.36 sum/s:     6us overhead: 0.00%
pert/s:        1 >21.34us:        0 min:  2.54 max:  7.79 avg:  3.38 sum/s:     3us overhead: 0.00%
pert/s:        1 >30.12us:        0 min:  3.51 max:  8.41 avg:  4.47 sum/s:     4us overhead: 0.00%
pert/s:        1 >21.01us:        0 min:  2.44 max:  7.36 avg:  3.33 sum/s:     3us overhead: 0.00%
pert/s:        1 >22.41us:        0 min:  2.42 max:  7.71 avg:  3.60 sum/s:     4us overhead: 0.00%
pert/s:        1 >30.37us:        0 min:  3.46 max:  8.62 avg:  4.49 sum/s:     4us overhead: 0.00%
pert/s:        1 >29.50us:        0 min:  3.43 max:  9.23 avg:  4.13 sum/s:     4us overhead: 0.00%
pert/s:        1 >20.66us:        0 min:  2.49 max:  7.73 avg:  4.24 sum/s:     4us overhead: 0.00%
pert/s:        1 >33.87us:        0 min:  4.45 max:  9.59 avg:  5.63 sum/s:     6us overhead: 0.00%
pert/s:        1 >34.70us:        0 min:  4.47 max: 10.15 avg:  6.41 sum/s:     6us overhead: 0.00%
pert/s:        1 >29.62us:        0 min:  4.49 max:  9.87 avg:  5.69 sum/s:     6us overhead: 0.00%
pert/s:        1 >36.92us:        0 min:  3.53 max:  9.41 avg:  4.48 sum/s:     4us overhead: 0.00%
pert/s:        1 >35.31us:        0 min:  3.69 max:  9.00 avg:  5.30 sum/s:     5us overhead: 0.00%
pert/s:        1 >36.29us:        0 min:  3.34 max:  8.48 avg:  4.48 sum/s:     4us overhead: 0.00%
pert/s:        1 >34.90us:        0 min:  3.39 max:  9.21 avg:  4.45 sum/s:     4us overhead: 0.00%
pert/s:        1 >34.23us:        0 min:  3.37 max:  8.44 avg:  4.54 sum/s:     5us overhead: 0.00%
pert/s:        1 >34.45us:        0 min:  0.05 max:  9.41 avg:  4.40 sum/s:     5us overhead: 0.00%
pert/s:        1 >35.31us:        0 min:  3.89 max:  9.18 avg:  4.30 sum/s:     5us overhead: 0.00%
pert/s:        1 >35.98us:        0 min:  2.80 max:  9.28 avg:  4.74 sum/s:     5us overhead: 0.00%
pert/s:        1 >33.89us:        0 min:  3.15 max:  9.67 avg:  5.07 sum/s:     5us overhead: 0.00%
pert/s:        1 >35.16us:        0 min:  2.56 max:  9.40 avg:  4.84 sum/s:     5us overhead: 0.00%
pert/s:        1 >36.37us:        0 min:  4.49 max:  9.48 avg:  6.12 sum/s:     6us overhead: 0.00%
pert/s:        1 >38.10us:        0 min:  0.04 max: 34.86 avg:  6.62 sum/s:    13us overhead: 0.00%
pert/s:        1 >35.11us:        0 min:  5.05 max: 11.56 avg:  5.88 sum/s:     6us overhead: 0.00%
pert/s:        1 >36.88us:        0 min:  3.77 max: 12.37 avg:  6.13 sum/s:     6us overhead: 0.00%
pert/s:        1 >34.37us:        1 min:  2.08 max:199.64 avg: 20.67 sum/s:    25us overhead: 0.00%
pert/s:        1 >35.57us:        1 min:  2.11 max:198.61 avg: 19.17 sum/s:    25us overhead: 0.00%
pert/s:        1 >33.89us:        1 min:  2.46 max:199.49 avg: 19.85 sum/s:    26us overhead: 0.00%
pert/s:        1 >37.58us:        1 min:  2.34 max:199.79 avg: 19.59 sum/s:    25us overhead: 0.00%
pert/s:        1 >34.57us:        0 min:  3.43 max: 13.37 avg:  5.86 sum/s:     6us overhead: 0.00%
pert/s:        1 >21.10us:        1 min:  2.42 max:199.97 avg: 20.08 sum/s:    26us overhead: 0.00%
pert/s:        1 >20.86us:        1 min:  2.23 max:194.83 avg: 19.69 sum/s:    26us overhead: 0.00%
pert/s:        1 >22.47us:        1 min:  2.15 max:197.13 avg: 19.61 sum/s:    25us overhead: 0.00%
pert/s:        1 >21.42us:        1 min:  2.24 max:198.75 avg: 19.70 sum/s:    26us overhead: 0.00%
pert/s:        1 >34.85us:        0 min:  0.05 max: 10.83 avg:  3.80 sum/s:     5us overhead: 0.00%
pert/s:        1 >33.72us:        0 min:  4.34 max: 11.78 avg:  6.04 sum/s:     6us overhead: 0.00%
pert/s:        1 >21.49us:        2 min:  2.13 max:200.35 avg: 20.22 sum/s:    26us overhead: 0.00%
pert/s:        1 >22.52us:        2 min:  2.32 max:197.07 avg: 21.02 sum/s:    27us overhead: 0.00%
pert/s:        1 >22.35us:        2 min:  2.16 max:197.04 avg: 20.59 sum/s:    27us overhead: 0.00%
pert/s:        1 >35.38us:        0 min:  3.20 max: 10.42 avg:  4.56 sum/s:     5us overhead: 0.00%
pert/s:      306 >17.41us:        1 min:  1.31 max: 51.95 avg:  6.83 sum/s:  2091us overhead: 0.21%
pert/s:        1 >99.81us:        1 min:  2.11 max:196.91 avg: 20.61 sum/s:    27us overhead: 0.00%
pert/s:        1 >21.45us:        2 min:  2.31 max:196.62 avg: 20.49 sum/s:    27us overhead: 0.00%
pert/s:        1 >97.14us:        1 min:  2.22 max:195.97 avg: 21.29 sum/s:    28us overhead: 0.00%
pert/s:        1 >21.94us:        2 min:  2.25 max:199.98 avg: 20.14 sum/s:    26us overhead: 0.00%
pert/s:      206 >75.07us:        1 min:  1.60 max:116.17 avg:  5.83 sum/s:  1202us overhead: 0.12%
pert/s:        1 >96.39us:        1 min:  2.26 max:194.60 avg: 21.29 sum/s:    28us overhead: 0.00%
pert/s:        1 >94.72us:        1 min:  2.20 max:193.63 avg: 21.32 sum/s:    28us overhead: 0.00%
pert/s:        1 >97.23us:        1 min:  2.08 max:198.18 avg: 21.27 sum/s:    28us overhead: 0.00%
pert/s:        1 >88.44us:        0 min:  2.28 max: 11.33 avg:  7.05 sum/s:     8us overhead: 0.00%
pert/s:        1 >89.22us:        0 min:  3.59 max: 15.86 avg:  7.81 sum/s:     8us overhead: 0.00%

model is not picky, calls frame jitter >30us a 'Flier', counts them, tags a few.

3.14.4-rt5 virgin source nohz_full=4-63

FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   1727998   0.0159  184.04 (1170916)0.7079  0.7321    0 (0)     16 (955515,955516,986596,986597,1017316,..1561069)
5   1727998   0.0159  186.94 (1171397)0.4114  0.6508    0 (0)     16 (956356,956357,987076,987077,1017796,..1171397)
6   1727999   0.0159  36.73 (595340)  0.8620  0.8818    0 (0)     11 (86411,86412,91211,91212,96011,..1209942)
7   1727999   0.0159  189.53 (1141636)0.4791  0.6720    0 (0)     17 (895876,926596,926597,957316,957317,..1141637)
8   1728000   0.0159  184.07 (988517) 0.3885  0.6788    0 (0)     16 (773476,773477,804196,804197,834916,..988517)
9   1728000   0.0159  180.74 (1050437)0.3514  0.6649    0 (0)     16 (835396,835397,866116,866117,896836,..1050437)
10  1728000   0.0159  188.84 (1020197)0.4211  0.6945    0 (0)     16 (805156,805157,835876,835877,866596,..1020197)
11  1728000   0.0159  180.98 (959237) 0.3867  0.6802    0 (0)     16 (744196,744197,774916,774917,805636,..959237)
12  1728000   0.0159  176.41 (898276) 0.6384  0.6972    0 (0)     16 (683236,683237,713956,713957,744676,..898277)
13  1728000   0.0159  188.84 (837317) 0.7538  0.8263    0 (0)     16 (622276,622277,652996,652997,683716,..837317)
14  1728000   0.0159  178.83 (1022117)0.5803  0.6995    0 (0)     16 (807076,807077,837796,837797,868516,..1022117)
15  1728000   0.0159  187.17 (838277) 0.7163  0.8367    0 (0)     16 (623236,623237,653956,653957,684676,..838277)
16  1728000   0.0159  184.31 (992357) 0.6860  0.9137    0 (0)     16 (777316,777317,808036,808037,838756,..992357)
17  1728000   0.0159  190.75 (962117) 0.6607  0.9281    0 (0)     17 (716356,747076,747077,777796,777797,..962117)
18  1728000   0.0159  186.46 (870437) 0.6505  0.9303    0 (0)     16 (655396,655397,686116,686117,716836,..870437)
19  1728000   0.0159  187.62 (901636) 0.8962  0.9769    0 (0)     16 (686596,686597,717316,717317,748036,..901637)
20  1728000   0.0159  187.89 (748517) 1.0297  1.0907    0 (0)     16 (533476,533477,564196,564197,594916,..748517)
21  1728000   0.0159  177.84 (779716) 0.9255  1.0430    0 (0)     16 (564676,564677,595396,595397,626116,..779717)
22  1728000   0.0159  192.42 (780197) 0.6549  0.9060    0 (0)     18 (534436,534437,565156,565157,595876,..780197)
23  1728000   0.0159  179.99 (719236) 0.8476  1.0329    0 (0)     16 (504196,504197,534916,534917,565636,..719237)
24  1800000   0.0725  8.75 (1545420)  0.8685  0.6879    0 (0)     
25  1800000   0.0725  20.19 (1550204) 0.8268  0.7200    0 (0)     
26  1800000   0.0725  34.02 (14704)   0.6771  0.7340    0 (0)     104 (14704,14705,46704,46705,78704,..1774705)
27  1800000   0.0725  48.47 (1519205) 0.6290  0.6766    0 (0)     112 (15204,15205,47204,47205,79204,..1775205)
28  1800000   0.0725  65.64 (1711705) 0.6870  0.7524    0 (0)     112 (15704,15705,47704,47705,79704,..1775705)
29  1800000   0.0725  108.41 (1616204)0.4684  0.9344    0 (0)     112 (16204,16205,48204,48205,80204,..1776205)
30  1800000   0.0725  108.17 (1680704)0.8166  1.0311    0 (0)     112 (16704,16705,48704,48705,80704,..1776705)
31  1800000   0.0725  123.81 (49205)  0.6050  1.1110    0 (0)     112 (17204,17205,49204,49205,81204,..1777205)
32  1800000   0.0725  113.42 (17704)  0.6158  0.9614    0 (0)     112 (17704,17705,49704,49705,81704,..1777705)
33  1800000   0.0725  184.94 (1458204)1.0111  1.7618    0 (0)     112 (18204,18205,50204,50205,82204,..1778205)                                                                                                                              
34  1800000   0.0725  194.72 (1490704)1.1291  1.7317    0 (0)     98 (18704,18705,50704,50705,82704,..1778705)                                                                                                                               
35  1800000   0.0725  185.56 (339205) 0.5819  1.5599    0 (0)     112 (19204,19205,51204,51205,83204,..1779205)                                                                                                                              
36  1800000   0.0725  30.45 (227051)  0.9345  1.0711    0 (0)     1 (227051)
37  1800000   0.0725  184.61 (1780205)0.7439  1.5621    0 (0)     112 (20204,20205,52204,52205,84204,..1780205)
38  1800000   0.0725  28.30 (329851)  0.9923  0.8368    0 (0)     
39  1800000   0.0725  26.15 (183951)  0.9777  0.8443    0 (0)     
40  1800000   0.0725  6.75 (1136462)  0.9447  0.7415    0 (0)     
41  1800000   0.0725  6.03 (1128961)  0.8416  0.6431    0 (0)     
42  1800000   0.0725  6.51 (1557012)  0.8961  0.6698    0 (0)     
43  1800000   0.0725  7.08 (1294060)  0.7015  0.6162    0 (0)     
44  540000    0.0032  9.30 (470245)   0.7457  0.5968    0 (0)     
45  540000    0.0032  17.88 (55184)   0.6024  0.9190    0 (0)     
46  540000    0.0032  104.43 (535411) 1.0158  1.0948    0 (0)     50 (305010,305011,314610,314611,324210,..535411)
47  540000    0.0032  17.16 (261898)  0.8780  1.0232    0 (0)     
48  540000    0.0032  17.65 (132511)  0.5251  0.4697    0 (0)     
49  540000    0.0032  166.41 (535860) 0.4444  1.4015    0 (0)     84 (142260,142261,151860,151861,161460,..535861)
50  540000    0.0032  86.30 (132810)  0.6620  0.6966    0 (0)     14 (46410,46411,56010,56011,84810,..132811)
51  540000    0.0032  193.36 (372961) 0.7880  1.6782    0 (0)     62 (8160,8161,17760,17761,27360,..372961)
52  540000    0.0032  113.96 (133110) 0.5096  0.7058    0 (0)     14 (46710,46711,56310,56311,85110,..133111)
53  540000    0.0032  191.69 (325261) 0.9986  1.6550    0 (0)     54 (8460,8461,18060,18061,27660,..325261)
54  540000    0.0032  123.74 (133411) 0.5987  0.7777    0 (0)     14 (47010,47011,56610,56611,85410,..133411)
55  540000    0.0032  193.60 (287161) 0.6729  1.5368    0 (0)     46 (8760,8761,18360,18361,27960,..287161)
56  540000    0.0032  194.79 (133711) 0.8196  1.3677    0 (0)     12 (47310,47311,56910,56911,85710,..133711)
57  540000    0.0032  19.08 (410169)  0.7227  1.1486    0 (0)     
58  540000    0.0032  192.16 (38010)  0.7522  1.3604    0 (0)     8 (9210,9211,18810,18811,28410,..38011)
59  540000    0.0032  18.12 (360437)  0.8875  1.1737    0 (0)     
60  540000    0.0032  23.84 (316830)  1.0781  1.2241    0 (0)     
61  540000    0.0032  27.90 (316815)  1.1200  1.2471    0 (0)     
62  540000    0.0032  22.18 (316830)  0.9777  1.0933    0 (0)     
63  540000    0.0032  26.94 (317310)  1.1227  1.2478    0 (0)     

Reference: 3.12.18-rt25-0.gf8a6df6-rt, nohz_idle, cpuset switches tick ON for rt set

FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   1727993   0.0159  4.07 (2599)     0.1072  0.1890    0 (0)     
5   1727993   0.0159  4.55 (807476)   0.0954  0.1669    0 (0)     
6   1727993   0.0159  3.80 (38023)    0.1321  0.2110    0 (0)     
7   1727994   0.0159  3.35 (1724219)  0.0898  0.1628    0 (0)     
8   1727994   0.0159  3.80 (109400)   0.0957  0.1852    0 (0)     
9   1727994   0.0159  3.83 (710923)   0.1001  0.1779    0 (0)     
10  1727994   0.0159  3.35 (529312)   0.1009  0.1741    0 (0)     
11  1727995   0.0159  3.83 (1372590)  0.0935  0.1740    0 (0)     
12  1727995   0.0159  3.83 (51129)    0.0857  0.1724    0 (0)     
13  1727995   0.0159  3.83 (1273109)  0.1028  0.1852    0 (0)     
14  1727996   0.0159  3.59 (486904)   0.1005  0.1811    0 (0)     
15  1727996   0.0159  3.35 (691340)   0.1589  0.1899    0 (0)     
16  1727996   0.0159  4.07 (1638706)  0.1340  0.2526    0 (0)     
17  1727997   0.0159  4.55 (913535)   0.1110  0.2050    0 (0)     
18  1727997   0.0159  4.31 (1704012)  0.1193  0.2129    0 (0)     
19  1727997   0.0159  5.23 (1273925)  0.1434  0.2372    0 (0)     
20  1727997   0.0159  5.26 (16547)    0.1119  0.2259    0 (0)     
21  1727998   0.0159  5.71 (341896)   0.1893  0.2458    0 (0)     
22  1727998   0.0159  4.55 (1276554)  0.1005  0.1961    0 (0)     
23  1727998   0.0159  5.71 (1029507)  0.2141  0.2460    0 (0)     
24  1799998   0.0725  3.98 (1551231)  0.1059  0.0518    0 (0)     
25  1799998   0.0725  2.79 (272233)   0.1192  0.0866    0 (0)     
26  1799998   0.0725  3.03 (272233)   0.1817  0.1317    0 (0)     
27  1799999   0.0725  3.03 (272233)   0.1426  0.1009    0 (0)     
28  1799999   0.0725  2.79 (402235)   0.2632  0.2574    0 (0)     
29  1799999   0.0725  2.46 (387055)   0.1109  0.0709    0 (0)     
30  1799999   0.0725  3.03 (387054)   0.1301  0.0860    0 (0)     
31  1800000   0.0725  4.60 (1329743)  0.3274  0.2551    0 (0)     
32  1800000   0.0725  2.93 (867055)   0.1076  0.0535    0 (0)     
33  1800000   0.0725  2.93 (867055)   0.1049  0.0524    0 (0)     
34  1800000   0.0725  3.50 (1347054)  0.1132  0.0740    0 (0)     
35  1800000   0.0725  3.27 (867054)   0.1076  0.0857    0 (0)     
36  1800000   0.0725  3.27 (867054)   0.2015  0.1381    0 (0)     
37  1800000   0.0725  3.74 (1347054)  0.1020  0.0461    0 (0)     
38  1800000   0.0725  3.17 (867055)   0.1118  0.0752    0 (0)     
39  1800000   0.0725  2.93 (1347055)  0.1092  0.0624    0 (0)     
40  1800000   0.0725  2.55 (867054)   0.1126  0.0703    0 (0)     
41  1800000   0.0725  2.93 (867055)   0.1092  0.0560    0 (0)     
42  1800000   0.0725  3.65 (387055)   0.2079  0.1424    0 (0)     
43  1800000   0.0725  6.51 (905345)   0.3940  0.3751    0 (0)     
44  539999    0.0032  3.10 (260115)   0.3366  0.2650    0 (0)     
45  539999    0.0032  2.62 (116115)   0.3365  0.2648    0 (0)     
46  539999    0.0032  4.53 (17248)    0.0854  0.2177    0 (0)     
47  539999    0.0032  4.06 (142)      0.0767  0.1120    0 (0)     
48  539999    0.0032  3.10 (260116)   0.0604  0.1029    0 (0)     
49  539999    0.0032  3.10 (404115)   0.0901  0.1160    0 (0)     
50  539999    0.0032  3.10 (404116)   0.1026  0.1449    0 (0)     
51  539999    0.0032  2.86 (260116)   0.1019  0.1571    0 (0)     
52  539999    0.0032  3.10 (116115)   0.0776  0.1190    0 (0)     
53  539999    0.0032  3.10 (260115)   0.0719  0.1121    0 (0)     
54  539999    0.0032  3.10 (260115)   0.3323  0.2669    0 (0)     
55  539999    0.0032  3.10 (260115)   0.3534  0.2873    0 (0)     
56  539999    0.0032  4.53 (422169)   0.1143  0.2653    0 (0)     
57  539999    0.0032  3.10 (260116)   0.1021  0.2007    0 (0)     
58  539999    0.0032  3.10 (116116)   0.0996  0.2120    0 (0)     
59  539999    0.0032  2.86 (116116)   0.0989  0.2017    0 (0)     
60  539999    0.0032  2.86 (116115)   0.3619  0.2676    0 (0)     
61  539999    0.0032  2.86 (116115)   0.3453  0.2593    0 (0)     
62  539999    0.0032  2.86 (116116)   0.1029  0.2016    0 (0)     
63  539999    0.0032  3.10 (116116)   0.0858  0.1893    0 (0)

3.14.4-rt5 + patches (hacks from reference) nohz_cpus=4-63

FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   1728000   0.0159  10.98 (1625029) 0.4708  0.6897    0 (0)     
5   1728000   0.0159  11.94 (1612693) 0.6770  0.8055    0 (0)     
6   1728000   0.0159  11.90 (1544891) 0.6773  0.6996    0 (0)     
7   1728000   0.0159  13.10 (1357976) 0.5167  0.6947    0 (0)     
8   1728000   0.0159  21.71 (1331577) 0.5297  0.8358    0 (0)     
9   1728000   0.0159  21.68 (1331576) 0.6269  0.7978    0 (0)     
10  1728000   0.0159  19.53 (889433)  0.8569  0.8690    0 (0)     
11  1728000   0.0159  29.82 (1508316) 0.5446  0.7971    0 (0)     
12  1728000   0.0159  24.81 (1502988) 0.5601  0.8393    0 (0)     
13  1728000   0.0159  21.71 (1476396) 0.7357  0.8276    0 (0)     
14  1728000   0.0159  12.86 (1178839) 0.5667  0.8542    0 (0)     
15  1728000   0.0159  11.90 (1671304) 0.6289  0.6630    0 (0)     
16  1728000   0.0159  23.35 (1493387) 0.7169  1.1261    0 (0)     
17  1728000   0.0159  17.87 (1463627) 0.9685  1.3873    0 (0)     
18  1728000   0.0159  16.91 (826801)  0.7671  1.0588    0 (0)     
19  1728000   0.0159  19.06 (1238264) 0.6330  0.9972    0 (0)     
20  1728000   0.0159  37.18 (270001)  1.3230  1.9480    0 (0)     15 (266593,266594,268369,268370,270001,..876386)
21  1728000   0.0159  40.52 (270769)  1.3299  1.6877    0 (0)     14 (266593,266594,268369,268370,270001,..876386)
22  1728000   0.0159  28.86 (273746)  1.3284  2.2402    0 (0)     
23  1728000   0.0159  39.35 (270770)  1.2386  1.6255    0 (0)     13 (266593,266594,268369,268370,270001,..273746)
24  1800000   0.0725  36.41 (282051)  1.3319  1.7074    0 (0)     9 (279551,279552,281251,281252,282051,..285152)
25  1800000   0.0725  40.60 (281252)  1.5767  1.8895    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
26  1800000   0.0725  44.42 (281252)  1.4205  1.7594    0 (0)     16 (277701,277702,279551,279552,281251,..926361)
27  1800000   0.0725  42.27 (281252)  1.3342  1.7801    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
28  1800000   0.0725  43.08 (281251)  0.8064  1.2355    0 (0)     16 (277701,277702,279551,279552,281251,..290202)
29  1800000   0.0725  44.18 (281252)  0.9319  1.0451    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
30  1800000   0.0725  41.56 (279552)  1.5054  1.7734    0 (0)     19 (277701,277702,279551,279552,281251,..1379844)
31  1800000   0.0725  43.23 (285152)  0.8076  0.9825    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
32  1800000   0.0725  64.21 (281252)  0.8604  1.6229    0 (0)     56 (277701,277702,279551,279552,281251,..420022)
33  1800000   0.0725  70.88 (281252)  0.8443  1.7623    0 (0)     843 (252701,264451,264452,269802,276752,..426672)
34  1800000   0.0725  74.46 (281252)  0.8727  1.8996    0 (0)     1741 (281251,281252,282051,282052,283601,..445622)
35  1800000   0.0725  50.14 (412212)  0.8905  2.0577    0 (0)     2716 (368532,369931,369932,372562,372581,..457072)
36  1800000   0.0725  31.16 (466021)  0.8785  0.8549    0 (0)     2 (466021,466022)
37  1800000   0.0725  44.75 (415861)  0.7989  1.3541    0 (0)     323 (408731,408732,411281,413861,413862,..490232)
38  1800000   0.0725  47.37 (411811)  0.9474  2.0748    0 (0)     2936 (400292,400371,400372,400771,400772,..495282)
39  1800000   0.0725  27.97 (466872)  0.8778  0.7751    0 (0)     
40  1800000   0.0725  7.56 (152518)   0.6923  0.6879    0 (0)     
41  1800000   0.0725  7.32 (146921)   0.6207  0.6168    0 (0)     
42  1800000   0.0725  6.99 (1756819)  0.6773  0.6053    0 (0)     
43  1800000   0.0725  6.27 (1275231)  0.8971  0.7166    0 (0)     
44  540000    0.0032  22.18 (416117)  1.2127  1.3930    0 (0)     
45  540000    0.0032  34.34 (124668)  1.3483  2.0767    0 (0)     5 (83864,84374,85574,124667,124668)
46  540000    0.0032  32.43 (123279)  1.2910  2.1807    0 (0)     7 (123278,123279,123353,123354,124928,..124953)
47  540000    0.0032  30.04 (83864)   1.2735  1.9704    0 (0)     1 (83864)
48  540000    0.0032  7.63 (233683)   0.8320  0.7331    0 (0)     
49  540000    0.0032  5.73 (204097)   0.6421  0.5255    0 (0)     
50  540000    0.0032  6.44 (72596)    0.5834  0.5951    0 (0)     
51  540000    0.0032  6.67 (468680)   0.5008  0.5379    0 (0)     
52  540000    0.0032  7.63 (394623)   0.6245  0.5303    0 (0)     
53  540000    0.0032  5.96 (456858)   0.4830  0.4921    0 (0)     
54  540000    0.0032  5.73 (361510)   0.7487  0.5892    0 (0)     
55  540000    0.0032  5.96 (108106)   0.5493  0.5178    0 (0)     
56  540000    0.0032  20.98 (539973)  1.1312  0.9690    0 (0)     
57  540000    0.0032  7.15 (537416)   0.4435  0.4855    0 (0)     
58  540000    0.0032  11.45 (537372)  0.5048  0.5684    0 (0)     
59  540000    0.0032  15.02 (537417)  0.6277  0.8161    0 (0)     
60  540000    0.0032  20.02 (511100)  0.7036  1.2132    0 (0)     
61  540000    0.0032  28.14 (471348)  0.9544  1.3316    0 (0)     
62  540000    0.0032  25.04 (461373)  0.7629  1.2334    0 (0)     
63  540000    0.0032  17.88 (532206)  0.6224  1.2652    0 (0)

3.14.4-rt5 + patches, nohz_cps=4-63, but with rt cpuset ticked

FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   1727990   0.0159  15.75 (1110991) 0.2705  0.6782    0 (0)     
5   1727991   0.0159  18.10 (1152942) 0.2295  0.5379    0 (0)     
6   1727991   0.0159  12.14 (740256)  0.1871  0.4767    0 (0)     
7   1727991   0.0159  13.61 (1398636) 0.1943  0.4162    0 (0)     
8   1727991   0.0159  21.68 (1128942) 0.2343  0.4782    0 (0)     
9   1727991   0.0159  17.66 (1373532) 0.2449  0.6676    0 (0)     
10  1727991   0.0159  21.68 (1370843) 0.3469  0.7723    0 (0)     
11  1727993   0.0159  14.56 (1654553) 0.3085  0.5460    0 (0)     
12  1727994   0.0159  16.91 (1640920) 0.6324  0.6797    0 (0)     
13  1727994   0.0159  14.05 (1646008) 0.4396  0.5460    0 (0)     
14  1727994   0.0159  17.90 (1385628) 0.1940  0.5126    0 (0)     
15  1727994   0.0159  14.53 (1324235) 0.5829  0.6251    0 (0)     
16  1727994   0.0159  18.61 (1170511) 0.2769  0.7185    0 (0)     
17  1727995   0.0159  15.96 (147911)  0.6323  1.1918    0 (0)     
18  1727995   0.0159  17.63 (1147518) 0.3379  0.8251    0 (0)     
19  1727997   0.0159  16.67 (1110990) 0.3504  0.8739    0 (0)     
20  1727997   0.0159  25.02 (1371899) 0.4171  0.9395    0 (0)     
21  1727997   0.0159  18.10 (1604824) 0.5120  0.9121    0 (0)     
22  1727997   0.0159  16.67 (1292219) 0.3707  1.0057    0 (0)     
23  1727997   0.0159  21.92 (1311275) 0.7368  0.9864    0 (0)     
24  1799997   0.0725  11.13 (513155)  0.2668  0.4618    0 (0)     
25  1799997   0.0725  14.47 (57648)   1.3242  1.3780    0 (0)     
26  1799997   0.0725  16.14 (790209)  0.3020  0.6119    0 (0)     
27  1799999   0.0725  18.52 (406878)  0.9193  1.2727    0 (0)     
28  1799999   0.0725  16.38 (1333567) 0.9898  1.0894    0 (0)     
29  1799999   0.0725  17.57 (1388068) 1.5751  1.4865    0 (0)     
30  1800000   0.0725  15.90 (944311)  0.8447  1.0841    0 (0)     
31  1800000   0.0725  19.00 (1118689) 1.2406  1.2589    0 (0)     
32  1800000   0.0725  33.21 (623752)  0.2034  0.5721    0 (0)     2 (623751,623752)
33  1800000   0.0725  43.80 (623751)  0.1869  0.5132    0 (0)     40 (555521,555522,558341,558342,561411,..634502)
34  1800000   0.0725  54.91 (623752)  0.2003  0.6696    0 (0)     256 (621921,621922,622231,622232,622251,..634652)
35  1800000   0.0725  11.61 (1689123) 0.2855  0.3485    0 (0)     
36  1800000   0.0725  19.95 (662351)  0.1990  0.4114    0 (0)     
37  1800000   0.0725  7.32 (775961)   0.1888  0.3463    0 (0)     
38  1800000   0.0725  45.23 (623751)  0.1812  0.5458    0 (0)     72 (555521,555522,558341,558342,561411,..633852)
39  1800000   0.0725  6.13 (2023)     0.1899  0.3141    0 (0)     
40  1800000   0.0725  3.03 (76948)    0.1348  0.1367    0 (0)     
41  1800000   0.0725  3.27 (278393)   0.1473  0.1599    0 (0)     
42  1800000   0.0725  3.74 (149791)   0.1490  0.1477    0 (0)     
43  1800000   0.0725  3.74 (76948)    0.3163  0.2310    0 (0)     
44  540000    0.0032  32.18 (187964)  0.2410  0.9227    0 (0)     19 (166655,166656,186674,186675,187289,..188639)
45  540000    0.0032  13.35 (118119)  0.9129  0.6304    0 (0)     
46  540000    0.0032  38.87 (187290)  0.3234  1.1625    0 (0)     227 (167501,167502,168422,168423,168575,..190395)
47  540000    0.0032  10.97 (70989)   0.7904  0.5492    0 (0)     
48  540000    0.0032  16.22 (526242)  0.1975  0.3578    0 (0)     
49  540000    0.0032  4.06 (113415)   0.1632  0.2182    0 (0)     
50  540000    0.0032  36.00 (348112)  1.0410  1.1357    0 (0)     4 (348112,348113,505655,505656)
51  540000    0.0032  7.63 (410141)   0.1802  0.3413    0 (0)     
52  540000    0.0032  7.63 (388792)   0.1467  0.2892    0 (0)     
53  540000    0.0032  22.41 (526242)  0.8564  1.0304    0 (0)     
54  540000    0.0032  16.22 (526242)  0.3234  0.5915    0 (0)     
55  540000    0.0032  20.51 (523077)  0.8910  1.0718    0 (0)     
56  540000    0.0032  20.51 (433008)  0.4362  1.3129    0 (0)     
57  540000    0.0032  23.84 (429227)  0.4652  1.3327    0 (0)     
58  540000    0.0032  23.60 (409787)  0.3603  1.1818    0 (0)     
59  540000    0.0032  21.69 (379973)  0.5904  1.3033    0 (0)     
60  540000    0.0032  24.80 (365811)  0.6453  1.4206    0 (0)     
61  540000    0.0032  26.23 (351861)  0.8988  1.2698    0 (0)     
62  540000    0.0032  24.32 (354042)  0.9292  1.2590    0 (0)     
63  540000    0.0032  24.80 (354441)  0.6686  1.3598    0 (0)

3.14.4-rt5 + patches. no nohz_full mask supplied, rt cpuset ticked

FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   1727999   0.0159  19.77 (255927)  0.6208  0.8240    0 (0)     
5   1727999   0.0159  5.02 (647067)   0.0868  0.1769    0 (0)     
6   1727999   0.0159  5.23 (643274)   0.0783  0.1614    0 (0)     
7   1728000   0.0159  5.26 (645483)   0.0940  0.1667    0 (0)     
8   1728000   0.0159  5.23 (1308493)  0.0928  0.2128    0 (0)     
9   1728000   0.0159  4.31 (7073)     0.0866  0.1984    0 (0)     
10  1728000   0.0159  4.78 (1409)     0.0857  0.2031    0 (0)     
11  1728000   0.0159  6.18 (1177115)  0.0946  0.2038    0 (0)     
12  1728000   0.0159  5.02 (1607995)  0.0921  0.2065    0 (0)     
13  1728000   0.0159  5.98 (1164828)  0.1021  0.2268    0 (0)     
14  1728000   0.0159  7.61 (1143227)  0.1055  0.2670    0 (0)     
15  1728000   0.0159  5.94 (1122923)  0.1346  0.2006    0 (0)     
16  1728000   0.0159  5.98 (285214)   0.1058  0.2706    0 (0)     
17  1728000   0.0159  9.04 (1143131)  0.1198  0.3034    0 (0)     
18  1728000   0.0159  5.98 (962842)   0.0934  0.2315    0 (0)     
19  1728000   0.0159  5.74 (1747)     0.1115  0.2409    0 (0)     
20  1728000   0.0159  5.94 (264838)   0.0931  0.2247    0 (0)     
21  1728000   0.0159  7.88 (389138)   0.1144  0.2702    0 (0)     
22  1728000   0.0159  7.85 (588413)   0.1962  0.2656    0 (0)     
23  1728000   0.0159  5.98 (1110060)  0.1984  0.3365    0 (0)     
24  1800000   0.0725  2.79 (796117)   0.1595  0.1585    0 (0)     
25  1800000   0.0725  2.31 (316117)   0.1086  0.0560    0 (0)     
26  1800000   0.0725  2.46 (796118)   0.1074  0.0544    0 (0)     
27  1800000   0.0725  3.98 (613155)   0.1087  0.0699    0 (0)     
28  1800000   0.0725  3.17 (613156)   0.1085  0.0585    0 (0)     
29  1800000   0.0725  8.18 (613156)   0.2171  0.2721    0 (0)     
30  1800000   0.0725  7.94 (612931)   0.2242  0.2651    0 (0)     
31  1800000   0.0725  7.94 (612931)   0.2499  0.3039    0 (0)     
32  1800000   0.0725  3.03 (1756117)  0.1260  0.0819    0 (0)     
33  1800000   0.0725  3.27 (1085885)  0.5809  0.4470    0 (0)     
34  1800000   0.0725  2.55 (316117)   0.1056  0.0504    0 (0)     
35  1800000   0.0725  3.27 (1965)     0.2121  0.4379    0 (0)     
36  1800000   0.0725  4.13 (1)        0.1324  0.0874    0 (0)     
37  1800000   0.0725  4.13 (1)        0.1808  0.1423    0 (0)     
38  1800000   0.0725  4.13 (1)        0.2027  0.1488    0 (0)     
39  1800000   0.0725  4.13 (1)        0.2094  0.1533    0 (0)     
40  1800000   0.0725  2.93 (316118)   0.1093  0.0742    0 (0)     
41  1800000   0.0725  3.03 (1756116)  0.1971  0.1469    0 (0)     
42  1800000   0.0725  3.27 (1210118)  0.6525  0.5411    0 (0)     
43  1800000   0.0725  3.03 (316117)   0.1048  0.0498    0 (0)     
44  539999    0.0032  2.86 (94835)    0.0828  0.1364    0 (0)     
45  539999    0.0032  2.62 (94834)    0.1914  0.1657    0 (0)     
46  540000    0.0032  3.10 (94834)    0.2435  0.1900    0 (0)     
47  540000    0.0032  2.38 (94834)    0.2505  0.1965    0 (0)     
48  540000    0.0032  3.82 (524593)   0.1307  0.2533    0 (0)     
49  540000    0.0032  2.86 (447946)   0.0904  0.2040    0 (0)     
50  540000    0.0032  3.34 (434056)   0.2087  0.2865    0 (0)     
51  540000    0.0032  3.10 (94835)    0.0921  0.2016    0 (0)     
52  540000    0.0032  7.39 (522302)   0.3460  0.3597    0 (0)     
53  540000    0.0032  7.39 (522302)   0.3449  0.3567    0 (0)     
54  540000    0.0032  7.15 (522302)   0.3259  0.3550    0 (0)     
55  540000    0.0032  7.39 (522302)   0.3274  0.3578    0 (0)     
56  540000    0.0032  6.20 (387845)   0.1547  0.3660    0 (0)     
57  540000    0.0032  7.87 (367398)   0.1425  0.3847    0 (0)     
58  540000    0.0032  7.63 (347661)   0.1272  0.3915    0 (0)     
59  540000    0.0032  9.30 (347660)   0.1239  0.3552    0 (0)     
60  540000    0.0032  12.16 (152143)  0.3169  0.3407    0 (0)     
61  540000    0.0032  10.02 (152143)  0.3359  0.3369    0 (0)     
62  540000    0.0032  12.16 (152143)  0.3347  0.3341    0 (0)     
63  540000    0.0032  12.40 (152143)  0.2970  0.3460    0 (0)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-18  4:22     ` Mike Galbraith
@ 2014-05-18  5:20       ` Paul E. McKenney
  2014-05-18  8:36         ` Mike Galbraith
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-18  5:20 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Sun, May 18, 2014 at 06:22:34AM +0200, Mike Galbraith wrote:
> On Thu, 2014-05-15 at 05:18 +0200, Mike Galbraith wrote: 
> > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:
> > 
> > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > > for -rt kernels in production environments.
> > 
> > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
> > all with 60 cores isolated.  I didn't have time to rummage, but it looks
> > like there are still bugs to squash. 
> 
> I tested a bit more yesterday.  With only NO_HZ_FULL (no all), it did go
> tickless.  A 10 second sample of perturbation numbers below.  Pretty
> noisy, but it does work in rt.
> 
> Below that, some jitter numbers, using a simplified and imperfect model
> of a high end rt video game (game over, insert 1 gold bar to continue
> variety of high end) executive synchronizing a simple load on 60 cores.
> Bottom line there was if user thinks booting nohz_full=set to get any
> core quiescence provided by that should improve jitter for a threaded
> load despite not being able to shut tick down, he's wrong.

If you are saying that turning on nohz_full doesn't help unless you
also ensure that there is only one runnable task per CPU, I completely
agree.  If you are saying something else, you lost me.  ;-)

						Thanx, Paul

> -Mike
> 
> vogelweide:/abuild/mike/:[0]# head -180 xx|tail -60
> pert/s:      500 >14.10us:        2 min:  1.90 max: 96.86 avg:  5.17 sum/s:  2586us overhead: 0.26%
> pert/s:        1 >18.63us:        0 min:  3.38 max:  8.37 avg:  4.71 sum/s:     5us overhead: 0.00%
> pert/s:        1 >18.54us:        0 min:  3.41 max:  8.51 avg:  4.61 sum/s:     5us overhead: 0.00%
> pert/s:        1 >18.78us:        0 min:  3.45 max:  8.67 avg:  4.41 sum/s:     4us overhead: 0.00%
> pert/s:        1 >19.60us:        0 min:  2.68 max:  7.47 avg:  4.30 sum/s:     4us overhead: 0.00%
> pert/s:        1 >20.61us:        0 min:  2.60 max:  7.54 avg:  3.94 sum/s:     4us overhead: 0.00%
> pert/s:        1 >21.03us:        0 min:  2.70 max:  7.81 avg:  3.91 sum/s:     4us overhead: 0.00%
> pert/s:        1 >19.89us:        0 min:  2.65 max:  7.83 avg:  3.98 sum/s:     4us overhead: 0.00%
> pert/s:        1 >29.36us:        0 min:  3.78 max:  9.82 avg:  6.10 sum/s:     6us overhead: 0.00%
> pert/s:        1 >28.86us:        0 min:  4.56 max: 10.12 avg:  6.36 sum/s:     6us overhead: 0.00%
> pert/s:        1 >21.34us:        0 min:  2.54 max:  7.79 avg:  3.38 sum/s:     3us overhead: 0.00%
> pert/s:        1 >30.12us:        0 min:  3.51 max:  8.41 avg:  4.47 sum/s:     4us overhead: 0.00%
> pert/s:        1 >21.01us:        0 min:  2.44 max:  7.36 avg:  3.33 sum/s:     3us overhead: 0.00%
> pert/s:        1 >22.41us:        0 min:  2.42 max:  7.71 avg:  3.60 sum/s:     4us overhead: 0.00%
> pert/s:        1 >30.37us:        0 min:  3.46 max:  8.62 avg:  4.49 sum/s:     4us overhead: 0.00%
> pert/s:        1 >29.50us:        0 min:  3.43 max:  9.23 avg:  4.13 sum/s:     4us overhead: 0.00%
> pert/s:        1 >20.66us:        0 min:  2.49 max:  7.73 avg:  4.24 sum/s:     4us overhead: 0.00%
> pert/s:        1 >33.87us:        0 min:  4.45 max:  9.59 avg:  5.63 sum/s:     6us overhead: 0.00%
> pert/s:        1 >34.70us:        0 min:  4.47 max: 10.15 avg:  6.41 sum/s:     6us overhead: 0.00%
> pert/s:        1 >29.62us:        0 min:  4.49 max:  9.87 avg:  5.69 sum/s:     6us overhead: 0.00%
> pert/s:        1 >36.92us:        0 min:  3.53 max:  9.41 avg:  4.48 sum/s:     4us overhead: 0.00%
> pert/s:        1 >35.31us:        0 min:  3.69 max:  9.00 avg:  5.30 sum/s:     5us overhead: 0.00%
> pert/s:        1 >36.29us:        0 min:  3.34 max:  8.48 avg:  4.48 sum/s:     4us overhead: 0.00%
> pert/s:        1 >34.90us:        0 min:  3.39 max:  9.21 avg:  4.45 sum/s:     4us overhead: 0.00%
> pert/s:        1 >34.23us:        0 min:  3.37 max:  8.44 avg:  4.54 sum/s:     5us overhead: 0.00%
> pert/s:        1 >34.45us:        0 min:  0.05 max:  9.41 avg:  4.40 sum/s:     5us overhead: 0.00%
> pert/s:        1 >35.31us:        0 min:  3.89 max:  9.18 avg:  4.30 sum/s:     5us overhead: 0.00%
> pert/s:        1 >35.98us:        0 min:  2.80 max:  9.28 avg:  4.74 sum/s:     5us overhead: 0.00%
> pert/s:        1 >33.89us:        0 min:  3.15 max:  9.67 avg:  5.07 sum/s:     5us overhead: 0.00%
> pert/s:        1 >35.16us:        0 min:  2.56 max:  9.40 avg:  4.84 sum/s:     5us overhead: 0.00%
> pert/s:        1 >36.37us:        0 min:  4.49 max:  9.48 avg:  6.12 sum/s:     6us overhead: 0.00%
> pert/s:        1 >38.10us:        0 min:  0.04 max: 34.86 avg:  6.62 sum/s:    13us overhead: 0.00%
> pert/s:        1 >35.11us:        0 min:  5.05 max: 11.56 avg:  5.88 sum/s:     6us overhead: 0.00%
> pert/s:        1 >36.88us:        0 min:  3.77 max: 12.37 avg:  6.13 sum/s:     6us overhead: 0.00%
> pert/s:        1 >34.37us:        1 min:  2.08 max:199.64 avg: 20.67 sum/s:    25us overhead: 0.00%
> pert/s:        1 >35.57us:        1 min:  2.11 max:198.61 avg: 19.17 sum/s:    25us overhead: 0.00%
> pert/s:        1 >33.89us:        1 min:  2.46 max:199.49 avg: 19.85 sum/s:    26us overhead: 0.00%
> pert/s:        1 >37.58us:        1 min:  2.34 max:199.79 avg: 19.59 sum/s:    25us overhead: 0.00%
> pert/s:        1 >34.57us:        0 min:  3.43 max: 13.37 avg:  5.86 sum/s:     6us overhead: 0.00%
> pert/s:        1 >21.10us:        1 min:  2.42 max:199.97 avg: 20.08 sum/s:    26us overhead: 0.00%
> pert/s:        1 >20.86us:        1 min:  2.23 max:194.83 avg: 19.69 sum/s:    26us overhead: 0.00%
> pert/s:        1 >22.47us:        1 min:  2.15 max:197.13 avg: 19.61 sum/s:    25us overhead: 0.00%
> pert/s:        1 >21.42us:        1 min:  2.24 max:198.75 avg: 19.70 sum/s:    26us overhead: 0.00%
> pert/s:        1 >34.85us:        0 min:  0.05 max: 10.83 avg:  3.80 sum/s:     5us overhead: 0.00%
> pert/s:        1 >33.72us:        0 min:  4.34 max: 11.78 avg:  6.04 sum/s:     6us overhead: 0.00%
> pert/s:        1 >21.49us:        2 min:  2.13 max:200.35 avg: 20.22 sum/s:    26us overhead: 0.00%
> pert/s:        1 >22.52us:        2 min:  2.32 max:197.07 avg: 21.02 sum/s:    27us overhead: 0.00%
> pert/s:        1 >22.35us:        2 min:  2.16 max:197.04 avg: 20.59 sum/s:    27us overhead: 0.00%
> pert/s:        1 >35.38us:        0 min:  3.20 max: 10.42 avg:  4.56 sum/s:     5us overhead: 0.00%
> pert/s:      306 >17.41us:        1 min:  1.31 max: 51.95 avg:  6.83 sum/s:  2091us overhead: 0.21%
> pert/s:        1 >99.81us:        1 min:  2.11 max:196.91 avg: 20.61 sum/s:    27us overhead: 0.00%
> pert/s:        1 >21.45us:        2 min:  2.31 max:196.62 avg: 20.49 sum/s:    27us overhead: 0.00%
> pert/s:        1 >97.14us:        1 min:  2.22 max:195.97 avg: 21.29 sum/s:    28us overhead: 0.00%
> pert/s:        1 >21.94us:        2 min:  2.25 max:199.98 avg: 20.14 sum/s:    26us overhead: 0.00%
> pert/s:      206 >75.07us:        1 min:  1.60 max:116.17 avg:  5.83 sum/s:  1202us overhead: 0.12%
> pert/s:        1 >96.39us:        1 min:  2.26 max:194.60 avg: 21.29 sum/s:    28us overhead: 0.00%
> pert/s:        1 >94.72us:        1 min:  2.20 max:193.63 avg: 21.32 sum/s:    28us overhead: 0.00%
> pert/s:        1 >97.23us:        1 min:  2.08 max:198.18 avg: 21.27 sum/s:    28us overhead: 0.00%
> pert/s:        1 >88.44us:        0 min:  2.28 max: 11.33 avg:  7.05 sum/s:     8us overhead: 0.00%
> pert/s:        1 >89.22us:        0 min:  3.59 max: 15.86 avg:  7.81 sum/s:     8us overhead: 0.00%
> 
> model is not picky, calls frame jitter >30us a 'Flier', counts them, tags a few.
> 
> 3.14.4-rt5 virgin source nohz_full=4-63
> 
> FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
> FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
> FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   1727998   0.0159  184.04 (1170916)0.7079  0.7321    0 (0)     16 (955515,955516,986596,986597,1017316,..1561069)
> 5   1727998   0.0159  186.94 (1171397)0.4114  0.6508    0 (0)     16 (956356,956357,987076,987077,1017796,..1171397)
> 6   1727999   0.0159  36.73 (595340)  0.8620  0.8818    0 (0)     11 (86411,86412,91211,91212,96011,..1209942)
> 7   1727999   0.0159  189.53 (1141636)0.4791  0.6720    0 (0)     17 (895876,926596,926597,957316,957317,..1141637)
> 8   1728000   0.0159  184.07 (988517) 0.3885  0.6788    0 (0)     16 (773476,773477,804196,804197,834916,..988517)
> 9   1728000   0.0159  180.74 (1050437)0.3514  0.6649    0 (0)     16 (835396,835397,866116,866117,896836,..1050437)
> 10  1728000   0.0159  188.84 (1020197)0.4211  0.6945    0 (0)     16 (805156,805157,835876,835877,866596,..1020197)
> 11  1728000   0.0159  180.98 (959237) 0.3867  0.6802    0 (0)     16 (744196,744197,774916,774917,805636,..959237)
> 12  1728000   0.0159  176.41 (898276) 0.6384  0.6972    0 (0)     16 (683236,683237,713956,713957,744676,..898277)
> 13  1728000   0.0159  188.84 (837317) 0.7538  0.8263    0 (0)     16 (622276,622277,652996,652997,683716,..837317)
> 14  1728000   0.0159  178.83 (1022117)0.5803  0.6995    0 (0)     16 (807076,807077,837796,837797,868516,..1022117)
> 15  1728000   0.0159  187.17 (838277) 0.7163  0.8367    0 (0)     16 (623236,623237,653956,653957,684676,..838277)
> 16  1728000   0.0159  184.31 (992357) 0.6860  0.9137    0 (0)     16 (777316,777317,808036,808037,838756,..992357)
> 17  1728000   0.0159  190.75 (962117) 0.6607  0.9281    0 (0)     17 (716356,747076,747077,777796,777797,..962117)
> 18  1728000   0.0159  186.46 (870437) 0.6505  0.9303    0 (0)     16 (655396,655397,686116,686117,716836,..870437)
> 19  1728000   0.0159  187.62 (901636) 0.8962  0.9769    0 (0)     16 (686596,686597,717316,717317,748036,..901637)
> 20  1728000   0.0159  187.89 (748517) 1.0297  1.0907    0 (0)     16 (533476,533477,564196,564197,594916,..748517)
> 21  1728000   0.0159  177.84 (779716) 0.9255  1.0430    0 (0)     16 (564676,564677,595396,595397,626116,..779717)
> 22  1728000   0.0159  192.42 (780197) 0.6549  0.9060    0 (0)     18 (534436,534437,565156,565157,595876,..780197)
> 23  1728000   0.0159  179.99 (719236) 0.8476  1.0329    0 (0)     16 (504196,504197,534916,534917,565636,..719237)
> 24  1800000   0.0725  8.75 (1545420)  0.8685  0.6879    0 (0)     
> 25  1800000   0.0725  20.19 (1550204) 0.8268  0.7200    0 (0)     
> 26  1800000   0.0725  34.02 (14704)   0.6771  0.7340    0 (0)     104 (14704,14705,46704,46705,78704,..1774705)
> 27  1800000   0.0725  48.47 (1519205) 0.6290  0.6766    0 (0)     112 (15204,15205,47204,47205,79204,..1775205)
> 28  1800000   0.0725  65.64 (1711705) 0.6870  0.7524    0 (0)     112 (15704,15705,47704,47705,79704,..1775705)
> 29  1800000   0.0725  108.41 (1616204)0.4684  0.9344    0 (0)     112 (16204,16205,48204,48205,80204,..1776205)
> 30  1800000   0.0725  108.17 (1680704)0.8166  1.0311    0 (0)     112 (16704,16705,48704,48705,80704,..1776705)
> 31  1800000   0.0725  123.81 (49205)  0.6050  1.1110    0 (0)     112 (17204,17205,49204,49205,81204,..1777205)
> 32  1800000   0.0725  113.42 (17704)  0.6158  0.9614    0 (0)     112 (17704,17705,49704,49705,81704,..1777705)
> 33  1800000   0.0725  184.94 (1458204)1.0111  1.7618    0 (0)     112 (18204,18205,50204,50205,82204,..1778205)                                                                                                                              
> 34  1800000   0.0725  194.72 (1490704)1.1291  1.7317    0 (0)     98 (18704,18705,50704,50705,82704,..1778705)                                                                                                                               
> 35  1800000   0.0725  185.56 (339205) 0.5819  1.5599    0 (0)     112 (19204,19205,51204,51205,83204,..1779205)                                                                                                                              
> 36  1800000   0.0725  30.45 (227051)  0.9345  1.0711    0 (0)     1 (227051)
> 37  1800000   0.0725  184.61 (1780205)0.7439  1.5621    0 (0)     112 (20204,20205,52204,52205,84204,..1780205)
> 38  1800000   0.0725  28.30 (329851)  0.9923  0.8368    0 (0)     
> 39  1800000   0.0725  26.15 (183951)  0.9777  0.8443    0 (0)     
> 40  1800000   0.0725  6.75 (1136462)  0.9447  0.7415    0 (0)     
> 41  1800000   0.0725  6.03 (1128961)  0.8416  0.6431    0 (0)     
> 42  1800000   0.0725  6.51 (1557012)  0.8961  0.6698    0 (0)     
> 43  1800000   0.0725  7.08 (1294060)  0.7015  0.6162    0 (0)     
> 44  540000    0.0032  9.30 (470245)   0.7457  0.5968    0 (0)     
> 45  540000    0.0032  17.88 (55184)   0.6024  0.9190    0 (0)     
> 46  540000    0.0032  104.43 (535411) 1.0158  1.0948    0 (0)     50 (305010,305011,314610,314611,324210,..535411)
> 47  540000    0.0032  17.16 (261898)  0.8780  1.0232    0 (0)     
> 48  540000    0.0032  17.65 (132511)  0.5251  0.4697    0 (0)     
> 49  540000    0.0032  166.41 (535860) 0.4444  1.4015    0 (0)     84 (142260,142261,151860,151861,161460,..535861)
> 50  540000    0.0032  86.30 (132810)  0.6620  0.6966    0 (0)     14 (46410,46411,56010,56011,84810,..132811)
> 51  540000    0.0032  193.36 (372961) 0.7880  1.6782    0 (0)     62 (8160,8161,17760,17761,27360,..372961)
> 52  540000    0.0032  113.96 (133110) 0.5096  0.7058    0 (0)     14 (46710,46711,56310,56311,85110,..133111)
> 53  540000    0.0032  191.69 (325261) 0.9986  1.6550    0 (0)     54 (8460,8461,18060,18061,27660,..325261)
> 54  540000    0.0032  123.74 (133411) 0.5987  0.7777    0 (0)     14 (47010,47011,56610,56611,85410,..133411)
> 55  540000    0.0032  193.60 (287161) 0.6729  1.5368    0 (0)     46 (8760,8761,18360,18361,27960,..287161)
> 56  540000    0.0032  194.79 (133711) 0.8196  1.3677    0 (0)     12 (47310,47311,56910,56911,85710,..133711)
> 57  540000    0.0032  19.08 (410169)  0.7227  1.1486    0 (0)     
> 58  540000    0.0032  192.16 (38010)  0.7522  1.3604    0 (0)     8 (9210,9211,18810,18811,28410,..38011)
> 59  540000    0.0032  18.12 (360437)  0.8875  1.1737    0 (0)     
> 60  540000    0.0032  23.84 (316830)  1.0781  1.2241    0 (0)     
> 61  540000    0.0032  27.90 (316815)  1.1200  1.2471    0 (0)     
> 62  540000    0.0032  22.18 (316830)  0.9777  1.0933    0 (0)     
> 63  540000    0.0032  26.94 (317310)  1.1227  1.2478    0 (0)     
> 
> Reference: 3.12.18-rt25-0.gf8a6df6-rt, nohz_idle, cpuset switches tick ON for rt set
> 
> FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
> FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
> FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   1727993   0.0159  4.07 (2599)     0.1072  0.1890    0 (0)     
> 5   1727993   0.0159  4.55 (807476)   0.0954  0.1669    0 (0)     
> 6   1727993   0.0159  3.80 (38023)    0.1321  0.2110    0 (0)     
> 7   1727994   0.0159  3.35 (1724219)  0.0898  0.1628    0 (0)     
> 8   1727994   0.0159  3.80 (109400)   0.0957  0.1852    0 (0)     
> 9   1727994   0.0159  3.83 (710923)   0.1001  0.1779    0 (0)     
> 10  1727994   0.0159  3.35 (529312)   0.1009  0.1741    0 (0)     
> 11  1727995   0.0159  3.83 (1372590)  0.0935  0.1740    0 (0)     
> 12  1727995   0.0159  3.83 (51129)    0.0857  0.1724    0 (0)     
> 13  1727995   0.0159  3.83 (1273109)  0.1028  0.1852    0 (0)     
> 14  1727996   0.0159  3.59 (486904)   0.1005  0.1811    0 (0)     
> 15  1727996   0.0159  3.35 (691340)   0.1589  0.1899    0 (0)     
> 16  1727996   0.0159  4.07 (1638706)  0.1340  0.2526    0 (0)     
> 17  1727997   0.0159  4.55 (913535)   0.1110  0.2050    0 (0)     
> 18  1727997   0.0159  4.31 (1704012)  0.1193  0.2129    0 (0)     
> 19  1727997   0.0159  5.23 (1273925)  0.1434  0.2372    0 (0)     
> 20  1727997   0.0159  5.26 (16547)    0.1119  0.2259    0 (0)     
> 21  1727998   0.0159  5.71 (341896)   0.1893  0.2458    0 (0)     
> 22  1727998   0.0159  4.55 (1276554)  0.1005  0.1961    0 (0)     
> 23  1727998   0.0159  5.71 (1029507)  0.2141  0.2460    0 (0)     
> 24  1799998   0.0725  3.98 (1551231)  0.1059  0.0518    0 (0)     
> 25  1799998   0.0725  2.79 (272233)   0.1192  0.0866    0 (0)     
> 26  1799998   0.0725  3.03 (272233)   0.1817  0.1317    0 (0)     
> 27  1799999   0.0725  3.03 (272233)   0.1426  0.1009    0 (0)     
> 28  1799999   0.0725  2.79 (402235)   0.2632  0.2574    0 (0)     
> 29  1799999   0.0725  2.46 (387055)   0.1109  0.0709    0 (0)     
> 30  1799999   0.0725  3.03 (387054)   0.1301  0.0860    0 (0)     
> 31  1800000   0.0725  4.60 (1329743)  0.3274  0.2551    0 (0)     
> 32  1800000   0.0725  2.93 (867055)   0.1076  0.0535    0 (0)     
> 33  1800000   0.0725  2.93 (867055)   0.1049  0.0524    0 (0)     
> 34  1800000   0.0725  3.50 (1347054)  0.1132  0.0740    0 (0)     
> 35  1800000   0.0725  3.27 (867054)   0.1076  0.0857    0 (0)     
> 36  1800000   0.0725  3.27 (867054)   0.2015  0.1381    0 (0)     
> 37  1800000   0.0725  3.74 (1347054)  0.1020  0.0461    0 (0)     
> 38  1800000   0.0725  3.17 (867055)   0.1118  0.0752    0 (0)     
> 39  1800000   0.0725  2.93 (1347055)  0.1092  0.0624    0 (0)     
> 40  1800000   0.0725  2.55 (867054)   0.1126  0.0703    0 (0)     
> 41  1800000   0.0725  2.93 (867055)   0.1092  0.0560    0 (0)     
> 42  1800000   0.0725  3.65 (387055)   0.2079  0.1424    0 (0)     
> 43  1800000   0.0725  6.51 (905345)   0.3940  0.3751    0 (0)     
> 44  539999    0.0032  3.10 (260115)   0.3366  0.2650    0 (0)     
> 45  539999    0.0032  2.62 (116115)   0.3365  0.2648    0 (0)     
> 46  539999    0.0032  4.53 (17248)    0.0854  0.2177    0 (0)     
> 47  539999    0.0032  4.06 (142)      0.0767  0.1120    0 (0)     
> 48  539999    0.0032  3.10 (260116)   0.0604  0.1029    0 (0)     
> 49  539999    0.0032  3.10 (404115)   0.0901  0.1160    0 (0)     
> 50  539999    0.0032  3.10 (404116)   0.1026  0.1449    0 (0)     
> 51  539999    0.0032  2.86 (260116)   0.1019  0.1571    0 (0)     
> 52  539999    0.0032  3.10 (116115)   0.0776  0.1190    0 (0)     
> 53  539999    0.0032  3.10 (260115)   0.0719  0.1121    0 (0)     
> 54  539999    0.0032  3.10 (260115)   0.3323  0.2669    0 (0)     
> 55  539999    0.0032  3.10 (260115)   0.3534  0.2873    0 (0)     
> 56  539999    0.0032  4.53 (422169)   0.1143  0.2653    0 (0)     
> 57  539999    0.0032  3.10 (260116)   0.1021  0.2007    0 (0)     
> 58  539999    0.0032  3.10 (116116)   0.0996  0.2120    0 (0)     
> 59  539999    0.0032  2.86 (116116)   0.0989  0.2017    0 (0)     
> 60  539999    0.0032  2.86 (116115)   0.3619  0.2676    0 (0)     
> 61  539999    0.0032  2.86 (116115)   0.3453  0.2593    0 (0)     
> 62  539999    0.0032  2.86 (116116)   0.1029  0.2016    0 (0)     
> 63  539999    0.0032  3.10 (116116)   0.0858  0.1893    0 (0)
> 
> 3.14.4-rt5 + patches (hacks from reference) nohz_cpus=4-63
> 
> FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
> FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
> FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   1728000   0.0159  10.98 (1625029) 0.4708  0.6897    0 (0)     
> 5   1728000   0.0159  11.94 (1612693) 0.6770  0.8055    0 (0)     
> 6   1728000   0.0159  11.90 (1544891) 0.6773  0.6996    0 (0)     
> 7   1728000   0.0159  13.10 (1357976) 0.5167  0.6947    0 (0)     
> 8   1728000   0.0159  21.71 (1331577) 0.5297  0.8358    0 (0)     
> 9   1728000   0.0159  21.68 (1331576) 0.6269  0.7978    0 (0)     
> 10  1728000   0.0159  19.53 (889433)  0.8569  0.8690    0 (0)     
> 11  1728000   0.0159  29.82 (1508316) 0.5446  0.7971    0 (0)     
> 12  1728000   0.0159  24.81 (1502988) 0.5601  0.8393    0 (0)     
> 13  1728000   0.0159  21.71 (1476396) 0.7357  0.8276    0 (0)     
> 14  1728000   0.0159  12.86 (1178839) 0.5667  0.8542    0 (0)     
> 15  1728000   0.0159  11.90 (1671304) 0.6289  0.6630    0 (0)     
> 16  1728000   0.0159  23.35 (1493387) 0.7169  1.1261    0 (0)     
> 17  1728000   0.0159  17.87 (1463627) 0.9685  1.3873    0 (0)     
> 18  1728000   0.0159  16.91 (826801)  0.7671  1.0588    0 (0)     
> 19  1728000   0.0159  19.06 (1238264) 0.6330  0.9972    0 (0)     
> 20  1728000   0.0159  37.18 (270001)  1.3230  1.9480    0 (0)     15 (266593,266594,268369,268370,270001,..876386)
> 21  1728000   0.0159  40.52 (270769)  1.3299  1.6877    0 (0)     14 (266593,266594,268369,268370,270001,..876386)
> 22  1728000   0.0159  28.86 (273746)  1.3284  2.2402    0 (0)     
> 23  1728000   0.0159  39.35 (270770)  1.2386  1.6255    0 (0)     13 (266593,266594,268369,268370,270001,..273746)
> 24  1800000   0.0725  36.41 (282051)  1.3319  1.7074    0 (0)     9 (279551,279552,281251,281252,282051,..285152)
> 25  1800000   0.0725  40.60 (281252)  1.5767  1.8895    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
> 26  1800000   0.0725  44.42 (281252)  1.4205  1.7594    0 (0)     16 (277701,277702,279551,279552,281251,..926361)
> 27  1800000   0.0725  42.27 (281252)  1.3342  1.7801    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
> 28  1800000   0.0725  43.08 (281251)  0.8064  1.2355    0 (0)     16 (277701,277702,279551,279552,281251,..290202)
> 29  1800000   0.0725  44.18 (281252)  0.9319  1.0451    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
> 30  1800000   0.0725  41.56 (279552)  1.5054  1.7734    0 (0)     19 (277701,277702,279551,279552,281251,..1379844)
> 31  1800000   0.0725  43.23 (285152)  0.8076  0.9825    0 (0)     14 (277701,277702,279551,279552,281251,..285152)
> 32  1800000   0.0725  64.21 (281252)  0.8604  1.6229    0 (0)     56 (277701,277702,279551,279552,281251,..420022)
> 33  1800000   0.0725  70.88 (281252)  0.8443  1.7623    0 (0)     843 (252701,264451,264452,269802,276752,..426672)
> 34  1800000   0.0725  74.46 (281252)  0.8727  1.8996    0 (0)     1741 (281251,281252,282051,282052,283601,..445622)
> 35  1800000   0.0725  50.14 (412212)  0.8905  2.0577    0 (0)     2716 (368532,369931,369932,372562,372581,..457072)
> 36  1800000   0.0725  31.16 (466021)  0.8785  0.8549    0 (0)     2 (466021,466022)
> 37  1800000   0.0725  44.75 (415861)  0.7989  1.3541    0 (0)     323 (408731,408732,411281,413861,413862,..490232)
> 38  1800000   0.0725  47.37 (411811)  0.9474  2.0748    0 (0)     2936 (400292,400371,400372,400771,400772,..495282)
> 39  1800000   0.0725  27.97 (466872)  0.8778  0.7751    0 (0)     
> 40  1800000   0.0725  7.56 (152518)   0.6923  0.6879    0 (0)     
> 41  1800000   0.0725  7.32 (146921)   0.6207  0.6168    0 (0)     
> 42  1800000   0.0725  6.99 (1756819)  0.6773  0.6053    0 (0)     
> 43  1800000   0.0725  6.27 (1275231)  0.8971  0.7166    0 (0)     
> 44  540000    0.0032  22.18 (416117)  1.2127  1.3930    0 (0)     
> 45  540000    0.0032  34.34 (124668)  1.3483  2.0767    0 (0)     5 (83864,84374,85574,124667,124668)
> 46  540000    0.0032  32.43 (123279)  1.2910  2.1807    0 (0)     7 (123278,123279,123353,123354,124928,..124953)
> 47  540000    0.0032  30.04 (83864)   1.2735  1.9704    0 (0)     1 (83864)
> 48  540000    0.0032  7.63 (233683)   0.8320  0.7331    0 (0)     
> 49  540000    0.0032  5.73 (204097)   0.6421  0.5255    0 (0)     
> 50  540000    0.0032  6.44 (72596)    0.5834  0.5951    0 (0)     
> 51  540000    0.0032  6.67 (468680)   0.5008  0.5379    0 (0)     
> 52  540000    0.0032  7.63 (394623)   0.6245  0.5303    0 (0)     
> 53  540000    0.0032  5.96 (456858)   0.4830  0.4921    0 (0)     
> 54  540000    0.0032  5.73 (361510)   0.7487  0.5892    0 (0)     
> 55  540000    0.0032  5.96 (108106)   0.5493  0.5178    0 (0)     
> 56  540000    0.0032  20.98 (539973)  1.1312  0.9690    0 (0)     
> 57  540000    0.0032  7.15 (537416)   0.4435  0.4855    0 (0)     
> 58  540000    0.0032  11.45 (537372)  0.5048  0.5684    0 (0)     
> 59  540000    0.0032  15.02 (537417)  0.6277  0.8161    0 (0)     
> 60  540000    0.0032  20.02 (511100)  0.7036  1.2132    0 (0)     
> 61  540000    0.0032  28.14 (471348)  0.9544  1.3316    0 (0)     
> 62  540000    0.0032  25.04 (461373)  0.7629  1.2334    0 (0)     
> 63  540000    0.0032  17.88 (532206)  0.6224  1.2652    0 (0)
> 
> 3.14.4-rt5 + patches, nohz_cps=4-63, but with rt cpuset ticked
> 
> FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
> FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
> FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   1727990   0.0159  15.75 (1110991) 0.2705  0.6782    0 (0)     
> 5   1727991   0.0159  18.10 (1152942) 0.2295  0.5379    0 (0)     
> 6   1727991   0.0159  12.14 (740256)  0.1871  0.4767    0 (0)     
> 7   1727991   0.0159  13.61 (1398636) 0.1943  0.4162    0 (0)     
> 8   1727991   0.0159  21.68 (1128942) 0.2343  0.4782    0 (0)     
> 9   1727991   0.0159  17.66 (1373532) 0.2449  0.6676    0 (0)     
> 10  1727991   0.0159  21.68 (1370843) 0.3469  0.7723    0 (0)     
> 11  1727993   0.0159  14.56 (1654553) 0.3085  0.5460    0 (0)     
> 12  1727994   0.0159  16.91 (1640920) 0.6324  0.6797    0 (0)     
> 13  1727994   0.0159  14.05 (1646008) 0.4396  0.5460    0 (0)     
> 14  1727994   0.0159  17.90 (1385628) 0.1940  0.5126    0 (0)     
> 15  1727994   0.0159  14.53 (1324235) 0.5829  0.6251    0 (0)     
> 16  1727994   0.0159  18.61 (1170511) 0.2769  0.7185    0 (0)     
> 17  1727995   0.0159  15.96 (147911)  0.6323  1.1918    0 (0)     
> 18  1727995   0.0159  17.63 (1147518) 0.3379  0.8251    0 (0)     
> 19  1727997   0.0159  16.67 (1110990) 0.3504  0.8739    0 (0)     
> 20  1727997   0.0159  25.02 (1371899) 0.4171  0.9395    0 (0)     
> 21  1727997   0.0159  18.10 (1604824) 0.5120  0.9121    0 (0)     
> 22  1727997   0.0159  16.67 (1292219) 0.3707  1.0057    0 (0)     
> 23  1727997   0.0159  21.92 (1311275) 0.7368  0.9864    0 (0)     
> 24  1799997   0.0725  11.13 (513155)  0.2668  0.4618    0 (0)     
> 25  1799997   0.0725  14.47 (57648)   1.3242  1.3780    0 (0)     
> 26  1799997   0.0725  16.14 (790209)  0.3020  0.6119    0 (0)     
> 27  1799999   0.0725  18.52 (406878)  0.9193  1.2727    0 (0)     
> 28  1799999   0.0725  16.38 (1333567) 0.9898  1.0894    0 (0)     
> 29  1799999   0.0725  17.57 (1388068) 1.5751  1.4865    0 (0)     
> 30  1800000   0.0725  15.90 (944311)  0.8447  1.0841    0 (0)     
> 31  1800000   0.0725  19.00 (1118689) 1.2406  1.2589    0 (0)     
> 32  1800000   0.0725  33.21 (623752)  0.2034  0.5721    0 (0)     2 (623751,623752)
> 33  1800000   0.0725  43.80 (623751)  0.1869  0.5132    0 (0)     40 (555521,555522,558341,558342,561411,..634502)
> 34  1800000   0.0725  54.91 (623752)  0.2003  0.6696    0 (0)     256 (621921,621922,622231,622232,622251,..634652)
> 35  1800000   0.0725  11.61 (1689123) 0.2855  0.3485    0 (0)     
> 36  1800000   0.0725  19.95 (662351)  0.1990  0.4114    0 (0)     
> 37  1800000   0.0725  7.32 (775961)   0.1888  0.3463    0 (0)     
> 38  1800000   0.0725  45.23 (623751)  0.1812  0.5458    0 (0)     72 (555521,555522,558341,558342,561411,..633852)
> 39  1800000   0.0725  6.13 (2023)     0.1899  0.3141    0 (0)     
> 40  1800000   0.0725  3.03 (76948)    0.1348  0.1367    0 (0)     
> 41  1800000   0.0725  3.27 (278393)   0.1473  0.1599    0 (0)     
> 42  1800000   0.0725  3.74 (149791)   0.1490  0.1477    0 (0)     
> 43  1800000   0.0725  3.74 (76948)    0.3163  0.2310    0 (0)     
> 44  540000    0.0032  32.18 (187964)  0.2410  0.9227    0 (0)     19 (166655,166656,186674,186675,187289,..188639)
> 45  540000    0.0032  13.35 (118119)  0.9129  0.6304    0 (0)     
> 46  540000    0.0032  38.87 (187290)  0.3234  1.1625    0 (0)     227 (167501,167502,168422,168423,168575,..190395)
> 47  540000    0.0032  10.97 (70989)   0.7904  0.5492    0 (0)     
> 48  540000    0.0032  16.22 (526242)  0.1975  0.3578    0 (0)     
> 49  540000    0.0032  4.06 (113415)   0.1632  0.2182    0 (0)     
> 50  540000    0.0032  36.00 (348112)  1.0410  1.1357    0 (0)     4 (348112,348113,505655,505656)
> 51  540000    0.0032  7.63 (410141)   0.1802  0.3413    0 (0)     
> 52  540000    0.0032  7.63 (388792)   0.1467  0.2892    0 (0)     
> 53  540000    0.0032  22.41 (526242)  0.8564  1.0304    0 (0)     
> 54  540000    0.0032  16.22 (526242)  0.3234  0.5915    0 (0)     
> 55  540000    0.0032  20.51 (523077)  0.8910  1.0718    0 (0)     
> 56  540000    0.0032  20.51 (433008)  0.4362  1.3129    0 (0)     
> 57  540000    0.0032  23.84 (429227)  0.4652  1.3327    0 (0)     
> 58  540000    0.0032  23.60 (409787)  0.3603  1.1818    0 (0)     
> 59  540000    0.0032  21.69 (379973)  0.5904  1.3033    0 (0)     
> 60  540000    0.0032  24.80 (365811)  0.6453  1.4206    0 (0)     
> 61  540000    0.0032  26.23 (351861)  0.8988  1.2698    0 (0)     
> 62  540000    0.0032  24.32 (354042)  0.9292  1.2590    0 (0)     
> 63  540000    0.0032  24.80 (354441)  0.6686  1.3598    0 (0)
> 
> 3.14.4-rt5 + patches. no nohz_full mask supplied, rt cpuset ticked
> 
> FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23
> FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43
> FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   1727999   0.0159  19.77 (255927)  0.6208  0.8240    0 (0)     
> 5   1727999   0.0159  5.02 (647067)   0.0868  0.1769    0 (0)     
> 6   1727999   0.0159  5.23 (643274)   0.0783  0.1614    0 (0)     
> 7   1728000   0.0159  5.26 (645483)   0.0940  0.1667    0 (0)     
> 8   1728000   0.0159  5.23 (1308493)  0.0928  0.2128    0 (0)     
> 9   1728000   0.0159  4.31 (7073)     0.0866  0.1984    0 (0)     
> 10  1728000   0.0159  4.78 (1409)     0.0857  0.2031    0 (0)     
> 11  1728000   0.0159  6.18 (1177115)  0.0946  0.2038    0 (0)     
> 12  1728000   0.0159  5.02 (1607995)  0.0921  0.2065    0 (0)     
> 13  1728000   0.0159  5.98 (1164828)  0.1021  0.2268    0 (0)     
> 14  1728000   0.0159  7.61 (1143227)  0.1055  0.2670    0 (0)     
> 15  1728000   0.0159  5.94 (1122923)  0.1346  0.2006    0 (0)     
> 16  1728000   0.0159  5.98 (285214)   0.1058  0.2706    0 (0)     
> 17  1728000   0.0159  9.04 (1143131)  0.1198  0.3034    0 (0)     
> 18  1728000   0.0159  5.98 (962842)   0.0934  0.2315    0 (0)     
> 19  1728000   0.0159  5.74 (1747)     0.1115  0.2409    0 (0)     
> 20  1728000   0.0159  5.94 (264838)   0.0931  0.2247    0 (0)     
> 21  1728000   0.0159  7.88 (389138)   0.1144  0.2702    0 (0)     
> 22  1728000   0.0159  7.85 (588413)   0.1962  0.2656    0 (0)     
> 23  1728000   0.0159  5.98 (1110060)  0.1984  0.3365    0 (0)     
> 24  1800000   0.0725  2.79 (796117)   0.1595  0.1585    0 (0)     
> 25  1800000   0.0725  2.31 (316117)   0.1086  0.0560    0 (0)     
> 26  1800000   0.0725  2.46 (796118)   0.1074  0.0544    0 (0)     
> 27  1800000   0.0725  3.98 (613155)   0.1087  0.0699    0 (0)     
> 28  1800000   0.0725  3.17 (613156)   0.1085  0.0585    0 (0)     
> 29  1800000   0.0725  8.18 (613156)   0.2171  0.2721    0 (0)     
> 30  1800000   0.0725  7.94 (612931)   0.2242  0.2651    0 (0)     
> 31  1800000   0.0725  7.94 (612931)   0.2499  0.3039    0 (0)     
> 32  1800000   0.0725  3.03 (1756117)  0.1260  0.0819    0 (0)     
> 33  1800000   0.0725  3.27 (1085885)  0.5809  0.4470    0 (0)     
> 34  1800000   0.0725  2.55 (316117)   0.1056  0.0504    0 (0)     
> 35  1800000   0.0725  3.27 (1965)     0.2121  0.4379    0 (0)     
> 36  1800000   0.0725  4.13 (1)        0.1324  0.0874    0 (0)     
> 37  1800000   0.0725  4.13 (1)        0.1808  0.1423    0 (0)     
> 38  1800000   0.0725  4.13 (1)        0.2027  0.1488    0 (0)     
> 39  1800000   0.0725  4.13 (1)        0.2094  0.1533    0 (0)     
> 40  1800000   0.0725  2.93 (316118)   0.1093  0.0742    0 (0)     
> 41  1800000   0.0725  3.03 (1756116)  0.1971  0.1469    0 (0)     
> 42  1800000   0.0725  3.27 (1210118)  0.6525  0.5411    0 (0)     
> 43  1800000   0.0725  3.03 (316117)   0.1048  0.0498    0 (0)     
> 44  539999    0.0032  2.86 (94835)    0.0828  0.1364    0 (0)     
> 45  539999    0.0032  2.62 (94834)    0.1914  0.1657    0 (0)     
> 46  540000    0.0032  3.10 (94834)    0.2435  0.1900    0 (0)     
> 47  540000    0.0032  2.38 (94834)    0.2505  0.1965    0 (0)     
> 48  540000    0.0032  3.82 (524593)   0.1307  0.2533    0 (0)     
> 49  540000    0.0032  2.86 (447946)   0.0904  0.2040    0 (0)     
> 50  540000    0.0032  3.34 (434056)   0.2087  0.2865    0 (0)     
> 51  540000    0.0032  3.10 (94835)    0.0921  0.2016    0 (0)     
> 52  540000    0.0032  7.39 (522302)   0.3460  0.3597    0 (0)     
> 53  540000    0.0032  7.39 (522302)   0.3449  0.3567    0 (0)     
> 54  540000    0.0032  7.15 (522302)   0.3259  0.3550    0 (0)     
> 55  540000    0.0032  7.39 (522302)   0.3274  0.3578    0 (0)     
> 56  540000    0.0032  6.20 (387845)   0.1547  0.3660    0 (0)     
> 57  540000    0.0032  7.87 (367398)   0.1425  0.3847    0 (0)     
> 58  540000    0.0032  7.63 (347661)   0.1272  0.3915    0 (0)     
> 59  540000    0.0032  9.30 (347660)   0.1239  0.3552    0 (0)     
> 60  540000    0.0032  12.16 (152143)  0.3169  0.3407    0 (0)     
> 61  540000    0.0032  10.02 (152143)  0.3359  0.3369    0 (0)     
> 62  540000    0.0032  12.16 (152143)  0.3347  0.3341    0 (0)     
> 63  540000    0.0032  12.40 (152143)  0.2970  0.3460    0 (0)
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-18  5:20       ` Paul E. McKenney
@ 2014-05-18  8:36         ` Mike Galbraith
  2014-05-18 15:58           ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Galbraith @ 2014-05-18  8:36 UTC (permalink / raw)
  To: paulmck
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:

> If you are saying that turning on nohz_full doesn't help unless you
> also ensure that there is only one runnable task per CPU, I completely
> agree.  If you are saying something else, you lost me.  ;-)

Yup, that's it more or less.  It's not only single task loads that could
benefit from better isolation, but if isolation improving measures are
tied to nohz_full, other sensitive loads will suffer if they try to use
isolation improvements.

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-18  8:36         ` Mike Galbraith
@ 2014-05-18 15:58           ` Paul E. McKenney
  2014-05-19  2:44             ` Mike Galbraith
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-18 15:58 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> 
> > If you are saying that turning on nohz_full doesn't help unless you
> > also ensure that there is only one runnable task per CPU, I completely
> > agree.  If you are saying something else, you lost me.  ;-)
> 
> Yup, that's it more or less.  It's not only single task loads that could
> benefit from better isolation, but if isolation improving measures are
> tied to nohz_full, other sensitive loads will suffer if they try to use
> isolation improvements.

So you are arguing for a separate Kconfig variable that does the isolation?
So that NO_HZ_FULL selects this new variable, and (for example) RCU
uses this new variable to decide when to pin the grace-period kthreads
onto the housekeeping CPU?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-18 15:58           ` Paul E. McKenney
@ 2014-05-19  2:44             ` Mike Galbraith
  2014-05-19  5:34               ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Galbraith @ 2014-05-19  2:44 UTC (permalink / raw)
  To: paulmck
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > 
> > > If you are saying that turning on nohz_full doesn't help unless you
> > > also ensure that there is only one runnable task per CPU, I completely
> > > agree.  If you are saying something else, you lost me.  ;-)
> > 
> > Yup, that's it more or less.  It's not only single task loads that could
> > benefit from better isolation, but if isolation improving measures are
> > tied to nohz_full, other sensitive loads will suffer if they try to use
> > isolation improvements.
> 
> So you are arguing for a separate Kconfig variable that does the isolation?
> So that NO_HZ_FULL selects this new variable, and (for example) RCU
> uses this new variable to decide when to pin the grace-period kthreads
> onto the housekeeping CPU?

I'm thinking more about runtime, but yes.

The tick mode really wants to be selectable per set (in my boxen you can
switch between nohz off/idle, but not yet nohz_full, that might get real
interesting).  You saw in my numbers that ticked is far better for the
threaded rt load, but what if the total load has both sensitive rt and
compute components to worry about?  The rt component wants relief from
the jitter that flipping the tick inflicts, but also wants as little
disturbance as possible, so RCU offload and whatever other measures that
are or become available are perhaps interesting to it as well.  The
numbers showed that here and now the two modes can work together in the
same box, I can have my rt set ticking away, and other cores doing
tickless compute, but enabling that via common config (distros don't
want to ship many kernel flavors) has a cost to rt performance.

Ideally, bean counting would be switchable too, giving all components
the environment they like best.

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-19  2:44             ` Mike Galbraith
@ 2014-05-19  5:34               ` Paul E. McKenney
  2014-05-20 14:53                 ` Frederic Weisbecker
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-19  5:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, fweisbec

On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > 
> > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > also ensure that there is only one runnable task per CPU, I completely
> > > > agree.  If you are saying something else, you lost me.  ;-)
> > > 
> > > Yup, that's it more or less.  It's not only single task loads that could
> > > benefit from better isolation, but if isolation improving measures are
> > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > isolation improvements.
> > 
> > So you are arguing for a separate Kconfig variable that does the isolation?
> > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > uses this new variable to decide when to pin the grace-period kthreads
> > onto the housekeeping CPU?
> 
> I'm thinking more about runtime, but yes.
> 
> The tick mode really wants to be selectable per set (in my boxen you can
> switch between nohz off/idle, but not yet nohz_full, that might get real
> interesting).  You saw in my numbers that ticked is far better for the
> threaded rt load, but what if the total load has both sensitive rt and
> compute components to worry about?  The rt component wants relief from
> the jitter that flipping the tick inflicts, but also wants as little
> disturbance as possible, so RCU offload and whatever other measures that
> are or become available are perhaps interesting to it as well.  The
> numbers showed that here and now the two modes can work together in the
> same box, I can have my rt set ticking away, and other cores doing
> tickless compute, but enabling that via common config (distros don't
> want to ship many kernel flavors) has a cost to rt performance.
> 
> Ideally, bean counting would be switchable too, giving all components
> the environment they like best.

Sounds like a question for Frederic (now CCed).  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-19  5:34               ` Paul E. McKenney
@ 2014-05-20 14:53                 ` Frederic Weisbecker
  2014-05-20 15:53                   ` Paul E. McKenney
  2014-05-21  3:52                   ` Mike Galbraith
  0 siblings, 2 replies; 31+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 14:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote:
> On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > > 
> > > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > > also ensure that there is only one runnable task per CPU, I completely
> > > > > agree.  If you are saying something else, you lost me.  ;-)
> > > > 
> > > > Yup, that's it more or less.  It's not only single task loads that could
> > > > benefit from better isolation, but if isolation improving measures are
> > > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > > isolation improvements.
> > > 
> > > So you are arguing for a separate Kconfig variable that does the isolation?
> > > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > > uses this new variable to decide when to pin the grace-period kthreads
> > > onto the housekeeping CPU?
> > 
> > I'm thinking more about runtime, but yes.
> > 
> > The tick mode really wants to be selectable per set (in my boxen you can
> > switch between nohz off/idle, but not yet nohz_full, that might get real
> > interesting).  You saw in my numbers that ticked is far better for the
> > threaded rt load, but what if the total load has both sensitive rt and
> > compute components to worry about?  The rt component wants relief from
> > the jitter that flipping the tick inflicts, but also wants as little
> > disturbance as possible, so RCU offload and whatever other measures that
> > are or become available are perhaps interesting to it as well.  The
> > numbers showed that here and now the two modes can work together in the
> > same box, I can have my rt set ticking away, and other cores doing
> > tickless compute, but enabling that via common config (distros don't
> > want to ship many kernel flavors) has a cost to rt performance.
> > 
> > Ideally, bean counting would be switchable too, giving all components
> > the environment they like best.
> 
> Sounds like a question for Frederic (now CCed).  ;-)

I'm not sure that I really understand what you want here.

The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
is actually off by default. This is only overriden by "nohz_full=" boot parameter.

Now if what you need is to enable or disable it at runtime instead of boottime,
I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
and RCU).

I've already been eyed by vulturous frozen sharks flying in circles above me lately
after a few overengineering visions.

And given that the full nohz code is still in a baby shape, it's probably not the right
time to expand it that way. I haven't even yet heard about users who crossed the testing
stage of full nohz.

We'll probably extend it that way in the future. But likely not in a near future.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 14:53                 ` Frederic Weisbecker
@ 2014-05-20 15:53                   ` Paul E. McKenney
  2014-05-20 16:24                     ` Frederic Weisbecker
  2014-05-21  4:18                     ` Mike Galbraith
  2014-05-21  3:52                   ` Mike Galbraith
  1 sibling, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-20 15:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote:
> > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > > > also ensure that there is only one runnable task per CPU, I completely
> > > > > > agree.  If you are saying something else, you lost me.  ;-)
> > > > > 
> > > > > Yup, that's it more or less.  It's not only single task loads that could
> > > > > benefit from better isolation, but if isolation improving measures are
> > > > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > > > isolation improvements.
> > > > 
> > > > So you are arguing for a separate Kconfig variable that does the isolation?
> > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > > > uses this new variable to decide when to pin the grace-period kthreads
> > > > onto the housekeeping CPU?
> > > 
> > > I'm thinking more about runtime, but yes.
> > > 
> > > The tick mode really wants to be selectable per set (in my boxen you can
> > > switch between nohz off/idle, but not yet nohz_full, that might get real
> > > interesting).  You saw in my numbers that ticked is far better for the
> > > threaded rt load, but what if the total load has both sensitive rt and
> > > compute components to worry about?  The rt component wants relief from
> > > the jitter that flipping the tick inflicts, but also wants as little
> > > disturbance as possible, so RCU offload and whatever other measures that
> > > are or become available are perhaps interesting to it as well.  The
> > > numbers showed that here and now the two modes can work together in the
> > > same box, I can have my rt set ticking away, and other cores doing
> > > tickless compute, but enabling that via common config (distros don't
> > > want to ship many kernel flavors) has a cost to rt performance.
> > > 
> > > Ideally, bean counting would be switchable too, giving all components
> > > the environment they like best.
> > 
> > Sounds like a question for Frederic (now CCed).  ;-)
> 
> I'm not sure that I really understand what you want here.
> 
> The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> is actually off by default. This is only overriden by "nohz_full=" boot parameter.

If I understand correctly, if there is no nohz_full= boot parameter,
then the context-tracking code takes the early exit via the
context_tracking_is_enabled() check in context_tracking_user_enter().
I would not expect this to cause much in the way of syscall performance
degradation.  However, it looks like having even one CPU in nohz_full
mode causes all CPUs to enable context tracking.

My guess is that Mike wants to have (say) half of his CPUs running
nohz_full, and the other half having fast system calls.  So my guess
also is that he would like some way of having the non-nohz_full CPUs
to opt out of the context-tracking overhead, including the memory
barriers and atomic ops in rcu_user_enter() and rcu_user_exit().  ;-)

> Now if what you need is to enable or disable it at runtime instead of boottime,
> I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> and RCU).

What Frederic said!  Making RCU deal with this is possible, but a bit on
the complicated side.  Given that I haven't heard too many people complaining
that RCU is too simple, I would like to opt out of runtime changes to the
nohz_full mask.

> I've already been eyed by vulturous frozen sharks flying in circles above me lately
> after a few overengineering visions.

Nothing like the icy glare of a frozen shark, is there?  ;-)

> And given that the full nohz code is still in a baby shape, it's probably not the right
> time to expand it that way. I haven't even yet heard about users who crossed the testing
> stage of full nohz.
> 
> We'll probably extend it that way in the future. But likely not in a near future.

My guess is that Mike would be OK with making nohz_full choice of CPUs
still at boot time, but that he would like the CPUs that are not to be
in nohz_full state be able to opt out of the context-tracking overhead.

Mike, please let us all know if I am misunderstanding what you are
looking for.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 15:53                   ` Paul E. McKenney
@ 2014-05-20 16:24                     ` Frederic Weisbecker
  2014-05-20 16:36                       ` Peter Zijlstra
  2014-05-20 17:20                       ` Paul E. McKenney
  2014-05-21  4:18                     ` Mike Galbraith
  1 sibling, 2 replies; 31+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 16:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote:
> On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> > I'm not sure that I really understand what you want here.
> > 
> > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> > is actually off by default. This is only overriden by "nohz_full=" boot parameter.
> 
> If I understand correctly, if there is no nohz_full= boot parameter,
> then the context-tracking code takes the early exit via the
> context_tracking_is_enabled() check in context_tracking_user_enter().

Exactly. It's even jump labeled. So it should, in the better arch support case,
resume to a single unconditional jump when it's off.

> I would not expect this to cause much in the way of syscall performance
> degradation.

Now the jump label concern all cases but syscalls (exceptions and irq). Syscalls
are even better off-case optimized with a TIF_NOHZ flag. So it goes down to the
slow path all-in-one condition. At least in x86.

> However, it looks like having even one CPU in nohz_full
> mode causes all CPUs to enable context tracking.

True unfortunately. It's necessary to track down syscalls and exceptions
entry exit across CPUs.

So if CPU 1 is full nohz and a task enters in userspace on CPU 0 and then migrates
to CPU 1, we must know there that it's resuming in userspace in order to stop the tick
confidently. So CPU 0 must do context tracking as well.

Of course one can argue that we can find out that the task is resuming in userspace from
CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe
solution for that. This is because probing on user/kernel boundaries can only be done
in the soft way, throught explicit function calls. So there is an inevitable shift
between soft and hard boundaries, between what we probe and what we can guess.

> 
> My guess is that Mike wants to have (say) half of his CPUs running
> nohz_full, and the other half having fast system calls.  So my guess
> also is that he would like some way of having the non-nohz_full CPUs
> to opt out of the context-tracking overhead, including the memory
> barriers and atomic ops in rcu_user_enter() and rcu_user_exit().  ;-)

I see. So we could possibly restrict the context tracking to a bunch of
CPUs but only iff the tasks running there can't run on non-tracking CPUs.

Ah one possible thing is to rely on the NOHZ flag for that and check which
task needs to be tracked.

> > Now if what you need is to enable or disable it at runtime instead of boottime,
> > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> > and RCU).
> 
> What Frederic said!  Making RCU deal with this is possible, but a bit on
> the complicated side.  Given that I haven't heard too many people complaining
> that RCU is too simple, I would like to opt out of runtime changes to the
> nohz_full mask.

Agreed.

> 
> > I've already been eyed by vulturous frozen sharks flying in circles above me lately
> > after a few overengineering visions.
> 
> Nothing like the icy glare of a frozen shark, is there?  ;-)

I think they were even three-eyed!!!

> 
> > And given that the full nohz code is still in a baby shape, it's probably not the right
> > time to expand it that way. I haven't even yet heard about users who crossed the testing
> > stage of full nohz.
> > 
> > We'll probably extend it that way in the future. But likely not in a near future.
> 
> My guess is that Mike would be OK with making nohz_full choice of CPUs
> still at boot time, but that he would like the CPUs that are not to be
> in nohz_full state be able to opt out of the context-tracking overhead.

Ok that might be possible. Although still require a bit of complication.
Lets wait for Mike input.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 16:24                     ` Frederic Weisbecker
@ 2014-05-20 16:36                       ` Peter Zijlstra
  2014-05-20 17:20                       ` Paul E. McKenney
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-05-20 16:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, Mike Galbraith, Paul Gortmaker, linux-kernel,
	linux-rt-users, Ingo Molnar, Steven Rostedt, Thomas Gleixner

On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote:
> Of course one can argue that we can find out that the task is resuming in userspace from
> CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe
> solution for that. This is because probing on user/kernel boundaries can only be done
> in the soft way, throught explicit function calls. So there is an inevitable shift
> between soft and hard boundaries, between what we probe and what we can guess.

you can hook into set_task_cpu(), not sure its going to be pretty, but
that is _the_ place to hook migration related nonsense.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 16:24                     ` Frederic Weisbecker
  2014-05-20 16:36                       ` Peter Zijlstra
@ 2014-05-20 17:20                       ` Paul E. McKenney
  2014-05-21  4:29                         ` Mike Galbraith
  1 sibling, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-20 17:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote:
> On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote:
> > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:

[ . . . ]

> > > We'll probably extend it that way in the future. But likely not in a near future.
> > 
> > My guess is that Mike would be OK with making nohz_full choice of CPUs
> > still at boot time, but that he would like the CPUs that are not to be
> > in nohz_full state be able to opt out of the context-tracking overhead.
> 
> Ok that might be possible. Although still require a bit of complication.
> Lets wait for Mike input.

Sounds good!

Mike, would this do what you need?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 17:20                       ` Paul E. McKenney
@ 2014-05-21  4:29                         ` Mike Galbraith
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-05-21  4:29 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, 2014-05-20 at 10:20 -0700, Paul E. McKenney wrote: 
> On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote:
> > On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> 
> [ . . . ]
> 
> > > > We'll probably extend it that way in the future. But likely not in a near future.
> > > 
> > > My guess is that Mike would be OK with making nohz_full choice of CPUs
> > > still at boot time, but that he would like the CPUs that are not to be
> > > in nohz_full state be able to opt out of the context-tracking overhead.
> > 
> > Ok that might be possible. Although still require a bit of complication.
> > Lets wait for Mike input.
> 
> Sounds good!
> 
> Mike, would this do what you need?

I don't _have_ a here and now need at all, I'm just looking at the
possibilities.  For the users I'm aware of here and now, I'm pretty sure
they'd be tickled pink with it as it sits ('course tickled pink will
quickly become "I see a 1.073us perturbation once every three weeks").

-Mike 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 15:53                   ` Paul E. McKenney
  2014-05-20 16:24                     ` Frederic Weisbecker
@ 2014-05-21  4:18                     ` Mike Galbraith
  2014-05-21 12:03                       ` Paul E. McKenney
  1 sibling, 1 reply; 31+ messages in thread
From: Mike Galbraith @ 2014-05-21  4:18 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, 2014-05-20 at 08:53 -0700, Paul E. McKenney wrote: 
> On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote:
> > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > > > > 
> > > > > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > > > > also ensure that there is only one runnable task per CPU, I completely
> > > > > > > agree.  If you are saying something else, you lost me.  ;-)
> > > > > > 
> > > > > > Yup, that's it more or less.  It's not only single task loads that could
> > > > > > benefit from better isolation, but if isolation improving measures are
> > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > > > > isolation improvements.
> > > > > 
> > > > > So you are arguing for a separate Kconfig variable that does the isolation?
> > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > > > > uses this new variable to decide when to pin the grace-period kthreads
> > > > > onto the housekeeping CPU?
> > > > 
> > > > I'm thinking more about runtime, but yes.
> > > > 
> > > > The tick mode really wants to be selectable per set (in my boxen you can
> > > > switch between nohz off/idle, but not yet nohz_full, that might get real
> > > > interesting).  You saw in my numbers that ticked is far better for the
> > > > threaded rt load, but what if the total load has both sensitive rt and
> > > > compute components to worry about?  The rt component wants relief from
> > > > the jitter that flipping the tick inflicts, but also wants as little
> > > > disturbance as possible, so RCU offload and whatever other measures that
> > > > are or become available are perhaps interesting to it as well.  The
> > > > numbers showed that here and now the two modes can work together in the
> > > > same box, I can have my rt set ticking away, and other cores doing
> > > > tickless compute, but enabling that via common config (distros don't
> > > > want to ship many kernel flavors) has a cost to rt performance.
> > > > 
> > > > Ideally, bean counting would be switchable too, giving all components
> > > > the environment they like best.
> > > 
> > > Sounds like a question for Frederic (now CCed).  ;-)
> > 
> > I'm not sure that I really understand what you want here.
> > 
> > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> > is actually off by default. This is only overriden by "nohz_full=" boot parameter.
> 
> If I understand correctly, if there is no nohz_full= boot parameter,
> then the context-tracking code takes the early exit via the
> context_tracking_is_enabled() check in context_tracking_user_enter().
> I would not expect this to cause much in the way of syscall performance
> degradation.  However, it looks like having even one CPU in nohz_full
> mode causes all CPUs to enable context tracking.
> 
> My guess is that Mike wants to have (say) half of his CPUs running
> nohz_full, and the other half having fast system calls.  So my guess
> also is that he would like some way of having the non-nohz_full CPUs
> to opt out of the context-tracking overhead, including the memory
> barriers and atomic ops in rcu_user_enter() and rcu_user_exit().  ;-)

Bingo.

> > Now if what you need is to enable or disable it at runtime instead of boottime,
> > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> > and RCU).
> 
> What Frederic said!  Making RCU deal with this is possible, but a bit on
> the complicated side.  Given that I haven't heard too many people complaining
> that RCU is too simple, I would like to opt out of runtime changes to the
> nohz_full mask.
> 
> > I've already been eyed by vulturous frozen sharks flying in circles above me lately
> > after a few overengineering visions.
> 
> Nothing like the icy glare of a frozen shark, is there?  ;-)
> 
> > And given that the full nohz code is still in a baby shape, it's probably not the right
> > time to expand it that way. I haven't even yet heard about users who crossed the testing
> > stage of full nohz.
> > 
> > We'll probably extend it that way in the future. But likely not in a near future.
> 
> My guess is that Mike would be OK with making nohz_full choice of CPUs
> still at boot time, but that he would like the CPUs that are not to be
> in nohz_full state be able to opt out of the context-tracking overhead.
> 
> Mike, please let us all know if I am misunderstanding what you are
> looking for.

Yup, exactly.  As it sits, you couldn't possible ship nohz_full out to
the real world in any other form than a specialty kernel.  There's no
doubt in my mind that there are users out there who would love to have
high performance rt and compute in the same box though.  I can imagine
them lurking here and slobbering profusely ;-)

-Mike

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-21  4:18                     ` Mike Galbraith
@ 2014-05-21 12:03                       ` Paul E. McKenney
  0 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2014-05-21 12:03 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Wed, May 21, 2014 at 06:18:01AM +0200, Mike Galbraith wrote:
> On Tue, 2014-05-20 at 08:53 -0700, Paul E. McKenney wrote: 
> > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote:
> > > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote:
> > > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> > > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > > > > > 
> > > > > > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > > > > > also ensure that there is only one runnable task per CPU, I completely
> > > > > > > > agree.  If you are saying something else, you lost me.  ;-)
> > > > > > > 
> > > > > > > Yup, that's it more or less.  It's not only single task loads that could
> > > > > > > benefit from better isolation, but if isolation improving measures are
> > > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > > > > > isolation improvements.
> > > > > > 
> > > > > > So you are arguing for a separate Kconfig variable that does the isolation?
> > > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > > > > > uses this new variable to decide when to pin the grace-period kthreads
> > > > > > onto the housekeeping CPU?
> > > > > 
> > > > > I'm thinking more about runtime, but yes.
> > > > > 
> > > > > The tick mode really wants to be selectable per set (in my boxen you can
> > > > > switch between nohz off/idle, but not yet nohz_full, that might get real
> > > > > interesting).  You saw in my numbers that ticked is far better for the
> > > > > threaded rt load, but what if the total load has both sensitive rt and
> > > > > compute components to worry about?  The rt component wants relief from
> > > > > the jitter that flipping the tick inflicts, but also wants as little
> > > > > disturbance as possible, so RCU offload and whatever other measures that
> > > > > are or become available are perhaps interesting to it as well.  The
> > > > > numbers showed that here and now the two modes can work together in the
> > > > > same box, I can have my rt set ticking away, and other cores doing
> > > > > tickless compute, but enabling that via common config (distros don't
> > > > > want to ship many kernel flavors) has a cost to rt performance.
> > > > > 
> > > > > Ideally, bean counting would be switchable too, giving all components
> > > > > the environment they like best.
> > > > 
> > > > Sounds like a question for Frederic (now CCed).  ;-)
> > > 
> > > I'm not sure that I really understand what you want here.
> > > 
> > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> > > is actually off by default. This is only overriden by "nohz_full=" boot parameter.
> > 
> > If I understand correctly, if there is no nohz_full= boot parameter,
> > then the context-tracking code takes the early exit via the
> > context_tracking_is_enabled() check in context_tracking_user_enter().
> > I would not expect this to cause much in the way of syscall performance
> > degradation.  However, it looks like having even one CPU in nohz_full
> > mode causes all CPUs to enable context tracking.
> > 
> > My guess is that Mike wants to have (say) half of his CPUs running
> > nohz_full, and the other half having fast system calls.  So my guess
> > also is that he would like some way of having the non-nohz_full CPUs
> > to opt out of the context-tracking overhead, including the memory
> > barriers and atomic ops in rcu_user_enter() and rcu_user_exit().  ;-)
> 
> Bingo.
> 
> > > Now if what you need is to enable or disable it at runtime instead of boottime,
> > > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> > > and RCU).
> > 
> > What Frederic said!  Making RCU deal with this is possible, but a bit on
> > the complicated side.  Given that I haven't heard too many people complaining
> > that RCU is too simple, I would like to opt out of runtime changes to the
> > nohz_full mask.
> > 
> > > I've already been eyed by vulturous frozen sharks flying in circles above me lately
> > > after a few overengineering visions.
> > 
> > Nothing like the icy glare of a frozen shark, is there?  ;-)
> > 
> > > And given that the full nohz code is still in a baby shape, it's probably not the right
> > > time to expand it that way. I haven't even yet heard about users who crossed the testing
> > > stage of full nohz.
> > > 
> > > We'll probably extend it that way in the future. But likely not in a near future.
> > 
> > My guess is that Mike would be OK with making nohz_full choice of CPUs
> > still at boot time, but that he would like the CPUs that are not to be
> > in nohz_full state be able to opt out of the context-tracking overhead.
> > 
> > Mike, please let us all know if I am misunderstanding what you are
> > looking for.
> 
> Yup, exactly.  As it sits, you couldn't possible ship nohz_full out to
> the real world in any other form than a specialty kernel.  There's no
> doubt in my mind that there are users out there who would love to have
> high performance rt and compute in the same box though.  I can imagine
> them lurking here and slobbering profusely ;-)

I think that shipping nohz_full out to the real world is already
happening, but that turning on nohz_full at boot time is expected to
clobber syscall performance globally.

Still, I can see the attraction of avoiding clobbering the syscall
performance on the non-nohz_ful CPUs when nohz_full is enabled on
only some of the CPUs.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-20 14:53                 ` Frederic Weisbecker
  2014-05-20 15:53                   ` Paul E. McKenney
@ 2014-05-21  3:52                   ` Mike Galbraith
  1 sibling, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2014-05-21  3:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Tue, 2014-05-20 at 16:53 +0200, Frederic Weisbecker wrote: 
> On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote:
> > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote:
> > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: 
> > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote:
> > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > > If you are saying that turning on nohz_full doesn't help unless you
> > > > > > also ensure that there is only one runnable task per CPU, I completely
> > > > > > agree.  If you are saying something else, you lost me.  ;-)
> > > > > 
> > > > > Yup, that's it more or less.  It's not only single task loads that could
> > > > > benefit from better isolation, but if isolation improving measures are
> > > > > tied to nohz_full, other sensitive loads will suffer if they try to use
> > > > > isolation improvements.
> > > > 
> > > > So you are arguing for a separate Kconfig variable that does the isolation?
> > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU
> > > > uses this new variable to decide when to pin the grace-period kthreads
> > > > onto the housekeeping CPU?
> > > 
> > > I'm thinking more about runtime, but yes.
> > > 
> > > The tick mode really wants to be selectable per set (in my boxen you can
> > > switch between nohz off/idle, but not yet nohz_full, that might get real
> > > interesting).  You saw in my numbers that ticked is far better for the
> > > threaded rt load, but what if the total load has both sensitive rt and
> > > compute components to worry about?  The rt component wants relief from
> > > the jitter that flipping the tick inflicts, but also wants as little
> > > disturbance as possible, so RCU offload and whatever other measures that
> > > are or become available are perhaps interesting to it as well.  The
> > > numbers showed that here and now the two modes can work together in the
> > > same box, I can have my rt set ticking away, and other cores doing
> > > tickless compute, but enabling that via common config (distros don't
> > > want to ship many kernel flavors) has a cost to rt performance.
> > > 
> > > Ideally, bean counting would be switchable too, giving all components
> > > the environment they like best.
> > 
> > Sounds like a question for Frederic (now CCed).  ;-)
> 
> I'm not sure that I really understand what you want here.
> 
> The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks
> is actually off by default. This is only overriden by "nohz_full=" boot parameter.
> 
> Now if what you need is to enable or disable it at runtime instead of boottime,
> I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched
> and RCU).

Yeah, that would be the most flexible (not to mention invasive).  That
said, users of nohz_full are likely gonna be very few and far between,
so maybe no big deal.  I'm just looking at it from a distro perspective,
what can it, can it not do, what does it cost, to see how it would best
be served once baked.

> I've already been eyed by vulturous frozen sharks flying in circles above me lately
> after a few overengineering visions.

:)

> And given that the full nohz code is still in a baby shape, it's probably not the right
> time to expand it that way. I haven't even yet heard about users who crossed the testing
> stage of full nohz.
> 
> We'll probably extend it that way in the future. But likely not in a near future.

Yeah, understood, I'm just measuring and pondering potentials.

-Mike


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-15  3:18   ` Mike Galbraith
  2014-05-15 14:45     ` Paul E. McKenney
  2014-05-18  4:22     ` Mike Galbraith
@ 2014-05-19 10:54     ` Peter Zijlstra
  2 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2014-05-19 10:54 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: paulmck, Paul Gortmaker, linux-kernel, linux-rt-users,
	Ingo Molnar, Steven Rostedt, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]

On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote:
> On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote:
> 
> > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received
> > for -rt kernels in production environments.
> 
> I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at
> all with 60 cores isolated.  I didn't have time to rummage, but it looks
> like there are still bugs to squash. 
> 
> Biggest problem with CONFIG_NO_HZ_FULL is the price tag.  It just raped
> fast mover performance last time I measured.

Syscall entry and exit times are through the roof with it. So anything
doing loads of syscalls will suffer badly.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker
  2014-05-14 15:44 ` Paul E. McKenney
@ 2014-05-19 12:40 ` Peter Zijlstra
  2014-05-22 19:40   ` Paul Gortmaker
  2014-11-27 11:21 ` Wanpeng Li
  2 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2014-05-19 12:40 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt,
	Thomas Gleixner, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 4155 bytes --]

On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> ("sched: rt-group: smp balancing") the concept of borrowing per
> cpu rt_runtime from one core to another was introduced.
> 
> However, this prevents the RT throttling message from ever being
> emitted when someone does a common (but mistaken) attempt at
> using too much CPU in RT context.  Consider the following test:


So the alternative approach is something like the below, where we will
not let it borrow more than the global bandwidth per cpu.

This whole sharing thing is completely fail anyway, but I really
wouldn't know what else to do and keep allowing RT tasks to set random
cpu affinities.


---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7386,10 +7386,59 @@ static int __rt_schedulable(struct task_
 	return ret;
 }
 
+/*
+ * ret := (a * b) / d
+ */
+static u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
+{
+	/*
+	 * Compute the 128bit product:
+	 *   a * b ->
+	 *     [ a = (ah * 2^32 + al),  b = (bh * 2^32 + bl) ]
+	 *   -> (ah * bh) * 2^64 + (ah * bl + al * bh) * 2^32 + al * bl
+	 */
+	u32 ah = (a >> 32);
+	u32 bh = (b >> 32);
+	u32 al = a;
+	u32 bl = b;
+
+	u64 mh, mm, ml;
+
+	mh = (u64)ah * bh;
+	mm = (u64)ah * bl + (u64)al * bh;
+	ml = (u64)al * bl;
+
+	mh += mm >> 32;
+	mm <<= 32;
+
+	ml += mm;
+	if (ml < mm) /* overflow */
+		mh++;
+
+	/*
+	 * Reduce the 128bit result to fit in a 64bit dividend:
+	 *   m / d -> (m / 2^n) / (d / 2^n)
+	 */
+	while (mh) {
+		ml >>= 1;
+		if (mh & 1)
+			ml |= 1ULL << 63;
+		mh >>= 1;
+		d >>= 1;
+	}
+
+	if (unlikely(!d))
+		return ml;
+
+	return div64_u64(ml, d);
+}
+
 static int tg_set_rt_bandwidth(struct task_group *tg,
 		u64 rt_period, u64 rt_runtime)
 {
 	int i, err = 0;
+	u64 g_period = global_rt_period();
+	u64 g_runtime = global_rt_runtime();
 
 	mutex_lock(&rt_constraints_mutex);
 	read_lock(&tasklist_lock);
@@ -7400,6 +7449,9 @@ static int tg_set_rt_bandwidth(struct ta
 	raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
 	tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period);
 	tg->rt_bandwidth.rt_runtime = rt_runtime;
+	tg->rt_bandwidth.rt_max_runtime = (g_runtime == RUNTIME_INF) ?
+		rt_period :
+		mul_u64_u64_div_u64(rt_period, g_runtime, g_period);
 
 	for_each_possible_cpu(i) {
 		struct rt_rq *rt_rq = tg->rt_rq[i];
@@ -7577,6 +7629,7 @@ static int sched_rt_global_validate(void
 static void sched_rt_do_global(void)
 {
 	def_rt_bandwidth.rt_runtime = global_rt_runtime();
+	def_rt_bandwidth.rt_max_runtime = global_rt_runtime();
 	def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period());
 }
 
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -614,12 +614,12 @@ static int do_balance_runtime(struct rt_
 	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
 	struct root_domain *rd = rq_of_rt_rq(rt_rq)->rd;
 	int i, weight, more = 0;
-	u64 rt_period;
+	u64 rt_max_runtime;
 
 	weight = cpumask_weight(rd->span);
 
 	raw_spin_lock(&rt_b->rt_runtime_lock);
-	rt_period = ktime_to_ns(rt_b->rt_period);
+	rt_max_runtime = rt_b->rt_max_runtime;
 	for_each_cpu(i, rd->span) {
 		struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
 		s64 diff;
@@ -643,12 +643,12 @@ static int do_balance_runtime(struct rt_
 		diff = iter->rt_runtime - iter->rt_time;
 		if (diff > 0) {
 			diff = div_u64((u64)diff, weight);
-			if (rt_rq->rt_runtime + diff > rt_period)
-				diff = rt_period - rt_rq->rt_runtime;
+			if (rt_rq->rt_runtime + diff > rt_max_runtime)
+				diff = rt_max_runtime - rt_rq->rt_runtime;
 			iter->rt_runtime -= diff;
 			rt_rq->rt_runtime += diff;
 			more = 1;
-			if (rt_rq->rt_runtime == rt_period) {
+			if (rt_rq->rt_runtime == rt_max_runtime) {
 				raw_spin_unlock(&iter->rt_runtime_lock);
 				break;
 			}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -124,6 +124,7 @@ struct rt_bandwidth {
 	raw_spinlock_t		rt_runtime_lock;
 	ktime_t			rt_period;
 	u64			rt_runtime;
+	u64			rt_max_runtime;
 	struct hrtimer		rt_period_timer;
 };
 /*

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-19 12:40 ` Peter Zijlstra
@ 2014-05-22 19:40   ` Paul Gortmaker
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Gortmaker @ 2014-05-22 19:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt,
	Thomas Gleixner, Paul E. McKenney

On 14-05-19 08:40 AM, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote:
>> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
>> ("sched: rt-group: smp balancing") the concept of borrowing per
>> cpu rt_runtime from one core to another was introduced.
>>
>> However, this prevents the RT throttling message from ever being
>> emitted when someone does a common (but mistaken) attempt at
>> using too much CPU in RT context.  Consider the following test:
> 
> 
> So the alternative approach is something like the below, where we will
> not let it borrow more than the global bandwidth per cpu.
> 
> This whole sharing thing is completely fail anyway, but I really
> wouldn't know what else to do and keep allowing RT tasks to set random
> cpu affinities.

So, for the record, this does seem to work, in the sense that the
original test of:

  echo "main() {for(;;);}" > full_load.c
  gcc full_load.c -o full_load
  taskset -c 1 ./full_load &
  chrt -r -p 80 `pidof full_load`

will emit the sched delayed throttling message instead of the less
informative (and 20s delayed) RCU stall.  Which IMHO is a win in
terms of being more friendly to the less informed users out there.

I'd re-tested it on today's linux-next tree, with RT_GROUP_SCHED off.

The downside is that we get another tuning knob that will inevitably
end up in /proc/sys/kernel/ and we'll need to explain somewhere how
the new max_runtime relates to the existing rt_runtime and rt_period.

I'm still unsure what the best solution for mainline is.  Clearly
just defaulting the sched feat to false is the simplest, and given
your description of it as "fail" perhaps that does makes sense.  :)

Paul.
--

> 
> 
> ---
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7386,10 +7386,59 @@ static int __rt_schedulable(struct task_
>  	return ret;
>  }
>  
> +/*
> + * ret := (a * b) / d
> + */
> +static u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> +{
> +	/*
> +	 * Compute the 128bit product:
> +	 *   a * b ->
> +	 *     [ a = (ah * 2^32 + al),  b = (bh * 2^32 + bl) ]
> +	 *   -> (ah * bh) * 2^64 + (ah * bl + al * bh) * 2^32 + al * bl
> +	 */
> +	u32 ah = (a >> 32);
> +	u32 bh = (b >> 32);
> +	u32 al = a;
> +	u32 bl = b;
> +
> +	u64 mh, mm, ml;
> +
> +	mh = (u64)ah * bh;
> +	mm = (u64)ah * bl + (u64)al * bh;
> +	ml = (u64)al * bl;
> +
> +	mh += mm >> 32;
> +	mm <<= 32;
> +
> +	ml += mm;
> +	if (ml < mm) /* overflow */
> +		mh++;
> +
> +	/*
> +	 * Reduce the 128bit result to fit in a 64bit dividend:
> +	 *   m / d -> (m / 2^n) / (d / 2^n)
> +	 */
> +	while (mh) {
> +		ml >>= 1;
> +		if (mh & 1)
> +			ml |= 1ULL << 63;
> +		mh >>= 1;
> +		d >>= 1;
> +	}
> +
> +	if (unlikely(!d))
> +		return ml;
> +
> +	return div64_u64(ml, d);
> +}
> +
>  static int tg_set_rt_bandwidth(struct task_group *tg,
>  		u64 rt_period, u64 rt_runtime)
>  {
>  	int i, err = 0;
> +	u64 g_period = global_rt_period();
> +	u64 g_runtime = global_rt_runtime();
>  
>  	mutex_lock(&rt_constraints_mutex);
>  	read_lock(&tasklist_lock);
> @@ -7400,6 +7449,9 @@ static int tg_set_rt_bandwidth(struct ta
>  	raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
>  	tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period);
>  	tg->rt_bandwidth.rt_runtime = rt_runtime;
> +	tg->rt_bandwidth.rt_max_runtime = (g_runtime == RUNTIME_INF) ?
> +		rt_period :
> +		mul_u64_u64_div_u64(rt_period, g_runtime, g_period);
>  
>  	for_each_possible_cpu(i) {
>  		struct rt_rq *rt_rq = tg->rt_rq[i];
> @@ -7577,6 +7629,7 @@ static int sched_rt_global_validate(void
>  static void sched_rt_do_global(void)
>  {
>  	def_rt_bandwidth.rt_runtime = global_rt_runtime();
> +	def_rt_bandwidth.rt_max_runtime = global_rt_runtime();
>  	def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period());
>  }
>  
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -614,12 +614,12 @@ static int do_balance_runtime(struct rt_
>  	struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
>  	struct root_domain *rd = rq_of_rt_rq(rt_rq)->rd;
>  	int i, weight, more = 0;
> -	u64 rt_period;
> +	u64 rt_max_runtime;
>  
>  	weight = cpumask_weight(rd->span);
>  
>  	raw_spin_lock(&rt_b->rt_runtime_lock);
> -	rt_period = ktime_to_ns(rt_b->rt_period);
> +	rt_max_runtime = rt_b->rt_max_runtime;
>  	for_each_cpu(i, rd->span) {
>  		struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
>  		s64 diff;
> @@ -643,12 +643,12 @@ static int do_balance_runtime(struct rt_
>  		diff = iter->rt_runtime - iter->rt_time;
>  		if (diff > 0) {
>  			diff = div_u64((u64)diff, weight);
> -			if (rt_rq->rt_runtime + diff > rt_period)
> -				diff = rt_period - rt_rq->rt_runtime;
> +			if (rt_rq->rt_runtime + diff > rt_max_runtime)
> +				diff = rt_max_runtime - rt_rq->rt_runtime;
>  			iter->rt_runtime -= diff;
>  			rt_rq->rt_runtime += diff;
>  			more = 1;
> -			if (rt_rq->rt_runtime == rt_period) {
> +			if (rt_rq->rt_runtime == rt_max_runtime) {
>  				raw_spin_unlock(&iter->rt_runtime_lock);
>  				break;
>  			}
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -124,6 +124,7 @@ struct rt_bandwidth {
>  	raw_spinlock_t		rt_runtime_lock;
>  	ktime_t			rt_period;
>  	u64			rt_runtime;
> +	u64			rt_max_runtime;
>  	struct hrtimer		rt_period_timer;
>  };
>  /*
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
  2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker
  2014-05-14 15:44 ` Paul E. McKenney
  2014-05-19 12:40 ` Peter Zijlstra
@ 2014-11-27 11:21 ` Wanpeng Li
  2 siblings, 0 replies; 31+ messages in thread
From: Wanpeng Li @ 2014-11-27 11:21 UTC (permalink / raw)
  To: Paul Gortmaker, linux-kernel
  Cc: linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Paul E. McKenney

Hi Paul,
On 5/14/14, 11:08 PM, Paul Gortmaker wrote:
> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
> ("sched: rt-group: smp balancing") the concept of borrowing per
> cpu rt_runtime from one core to another was introduced.
>
> However, this prevents the RT throttling message from ever being
> emitted when someone does a common (but mistaken) attempt at
> using too much CPU in RT context.  Consider the following test:
>
>    echo "main() {for(;;);}" > full_load.c
>    gcc full_load.c -o full_load
>    taskset -c 1 ./full_load &
>    chrt -r -p 80 `pidof full_load`

I try this on 3.18-rc6 w/ CONFIG_RCU_CPU_STALL_TIMEOUT=60 and 
SCHED_FEAT(RT_RUNTIME_SHARE, true), however I don't see rcu stall 
warning, where I miss?

Regards,
Wanpeng Li

> When run on x86_64 defconfig, what happens is as follows:
>
> -task runs on core1 for 95% of an rt_period as documented in
>   the file Documentation/scheduler/sched-rt-group.txt
>
> -at 95%, the code in balance_runtime sees this threshold and
>   calls do_balance_runtime()
>
> -do_balance_runtime sees that core 1 is in need, and does this:
> 	---------------
>          if (rt_rq->rt_runtime + diff > rt_period)
>                  diff = rt_period - rt_rq->rt_runtime;
>          iter->rt_runtime -= diff;
>          rt_rq->rt_runtime += diff;
> 	---------------
>   which extends core1's rt_runtime by 5%, making it 100% of rt_period
>   by stealing 5% from core0 (or possibly some other core).
>
> However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
> we hit this near the top of that function:
> 	---------------
>          if (runtime >= sched_rt_period(rt_rq))
>                  return 0;
> 	---------------
> and hence we'll _never_ look at/set any of the throttling checks and
> messages in sched_rt_runtime_exceeded().  Instead, we will happily
> plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
> the RCU subsystem will get angry and trigger an NMI in response to
> what it rightly sees as a WTF situation.
>
> Granted, there are lots of ways you can do bad things to yourself with
> RT, but in the current zeitgeist of multicore systems with people
> dedicating individual cores to individual tasks, I'd say the above is
> common enough that we should react to it sensibly, and an RCU stall
> really doesn't translate well to an end user vs a simple message that
> says "throttling activated".
>
> One way to get the throttle message instead of the ambiguous and lengthy
> NMI triggered all core backtrace of the RCU stall is to change the
> SCHED_FEAT(RT_RUNTIME_SHARE, true) to false.  One could make a good
> case for this being the default for the out-of-tree preempt-rt series,
> since folks using that are more apt to be manually tuning the system
> and won't want an invisible hand coming in and making changes.
>
> However, in mainline, where it is more likely that there will be
> n+x (x>0) RT tasks on an n core system, we can leave the sharing on,
> and still avoid the RCU stalls by noting that there is no point in
> trying to balance when there are no tasks to migrate, or only a
> single RT task is present.  Inflating the rt_runtime does nothing
> in this case other than defeat sched_rt_runtime_exceeded().
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>
> [I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338
>   and tglx asked why they wouldn't see the throttle message; it is only
>   now that I had a chance to dig in and figure out why.  Oh, and the patch
>   is against linux-next, in case that matters...]
>
>   kernel/sched/rt.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index ea4d500..698aac9 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
>   	if (!sched_feat(RT_RUNTIME_SHARE))
>   		return more;
>   
> +	/*
> +	 * Stealing from another core won't help us at all if
> +	 * we have nothing to migrate over there, or only one
> +	 * task that is running up all the rt_time.  In fact it
> +	 * will just inhibit the throttling message in that case.
> +	 */
> +	if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
> +		return more;
> +
>   	if (rt_rq->rt_time > rt_rq->rt_runtime) {
>   		raw_spin_unlock(&rt_rq->rt_runtime_lock);
>   		more = do_balance_runtime(rt_rq);


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-11-27 15:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker
2014-05-14 15:44 ` Paul E. McKenney
2014-05-14 19:11   ` Paul Gortmaker
2014-05-14 19:27     ` Paul E. McKenney
2014-05-15  2:49     ` Mike Galbraith
2014-05-15 14:09       ` Paul Gortmaker
2014-11-27  9:17       ` Wanpeng Li
2014-11-27 15:31         ` Mike Galbraith
2014-11-27 11:36     ` Wanpeng Li
2014-05-15  3:18   ` Mike Galbraith
2014-05-15 14:45     ` Paul E. McKenney
2014-05-15 17:27       ` Mike Galbraith
2014-05-18  4:22     ` Mike Galbraith
2014-05-18  5:20       ` Paul E. McKenney
2014-05-18  8:36         ` Mike Galbraith
2014-05-18 15:58           ` Paul E. McKenney
2014-05-19  2:44             ` Mike Galbraith
2014-05-19  5:34               ` Paul E. McKenney
2014-05-20 14:53                 ` Frederic Weisbecker
2014-05-20 15:53                   ` Paul E. McKenney
2014-05-20 16:24                     ` Frederic Weisbecker
2014-05-20 16:36                       ` Peter Zijlstra
2014-05-20 17:20                       ` Paul E. McKenney
2014-05-21  4:29                         ` Mike Galbraith
2014-05-21  4:18                     ` Mike Galbraith
2014-05-21 12:03                       ` Paul E. McKenney
2014-05-21  3:52                   ` Mike Galbraith
2014-05-19 10:54     ` Peter Zijlstra
2014-05-19 12:40 ` Peter Zijlstra
2014-05-22 19:40   ` Paul Gortmaker
2014-11-27 11:21 ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).