* [PATCH] sched/rt: don't try to balance rt_runtime when it is futile
@ 2014-05-14 15:08 Paul Gortmaker
2014-05-14 15:44 ` Paul E. McKenney
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Paul Gortmaker @ 2014-05-14 15:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-rt-users, Paul Gortmaker, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, Thomas Gleixner, Paul E. McKenney
As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11
("sched: rt-group: smp balancing") the concept of borrowing per
cpu rt_runtime from one core to another was introduced.
However, this prevents the RT throttling message from ever being
emitted when someone does a common (but mistaken) attempt at
using too much CPU in RT context. Consider the following test:
echo "main() {for(;;);}" > full_load.c
gcc full_load.c -o full_load
taskset -c 1 ./full_load &
chrt -r -p 80 `pidof full_load`
When run on x86_64 defconfig, what happens is as follows:
-task runs on core1 for 95% of an rt_period as documented in
the file Documentation/scheduler/sched-rt-group.txt
-at 95%, the code in balance_runtime sees this threshold and
calls do_balance_runtime()
-do_balance_runtime sees that core 1 is in need, and does this:
---------------
if (rt_rq->rt_runtime + diff > rt_period)
diff = rt_period - rt_rq->rt_runtime;
iter->rt_runtime -= diff;
rt_rq->rt_runtime += diff;
---------------
which extends core1's rt_runtime by 5%, making it 100% of rt_period
by stealing 5% from core0 (or possibly some other core).
However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(),
we hit this near the top of that function:
---------------
if (runtime >= sched_rt_period(rt_rq))
return 0;
---------------
and hence we'll _never_ look at/set any of the throttling checks and
messages in sched_rt_runtime_exceeded(). Instead, we will happily
plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point
the RCU subsystem will get angry and trigger an NMI in response to
what it rightly sees as a WTF situation.
Granted, there are lots of ways you can do bad things to yourself with
RT, but in the current zeitgeist of multicore systems with people
dedicating individual cores to individual tasks, I'd say the above is
common enough that we should react to it sensibly, and an RCU stall
really doesn't translate well to an end user vs a simple message that
says "throttling activated".
One way to get the throttle message instead of the ambiguous and lengthy
NMI triggered all core backtrace of the RCU stall is to change the
SCHED_FEAT(RT_RUNTIME_SHARE, true) to false. One could make a good
case for this being the default for the out-of-tree preempt-rt series,
since folks using that are more apt to be manually tuning the system
and won't want an invisible hand coming in and making changes.
However, in mainline, where it is more likely that there will be
n+x (x>0) RT tasks on an n core system, we can leave the sharing on,
and still avoid the RCU stalls by noting that there is no point in
trying to balance when there are no tasks to migrate, or only a
single RT task is present. Inflating the rt_runtime does nothing
in this case other than defeat sched_rt_runtime_exceeded().
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
[I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338
and tglx asked why they wouldn't see the throttle message; it is only
now that I had a chance to dig in and figure out why. Oh, and the patch
is against linux-next, in case that matters...]
kernel/sched/rt.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ea4d500..698aac9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq)
if (!sched_feat(RT_RUNTIME_SHARE))
return more;
+ /*
+ * Stealing from another core won't help us at all if
+ * we have nothing to migrate over there, or only one
+ * task that is running up all the rt_time. In fact it
+ * will just inhibit the throttling message in that case.
+ */
+ if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1)
+ return more;
+
if (rt_rq->rt_time > rt_rq->rt_runtime) {
raw_spin_unlock(&rt_rq->rt_runtime_lock);
more = do_balance_runtime(rt_rq);
--
1.8.2.3
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker @ 2014-05-14 15:44 ` Paul E. McKenney 2014-05-14 19:11 ` Paul Gortmaker 2014-05-15 3:18 ` Mike Galbraith 2014-05-19 12:40 ` Peter Zijlstra 2014-11-27 11:21 ` Wanpeng Li 2 siblings, 2 replies; 31+ messages in thread From: Paul E. McKenney @ 2014-05-14 15:44 UTC (permalink / raw) To: Paul Gortmaker Cc: linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 > ("sched: rt-group: smp balancing") the concept of borrowing per > cpu rt_runtime from one core to another was introduced. > > However, this prevents the RT throttling message from ever being > emitted when someone does a common (but mistaken) attempt at > using too much CPU in RT context. Consider the following test: > > echo "main() {for(;;);}" > full_load.c > gcc full_load.c -o full_load > taskset -c 1 ./full_load & > chrt -r -p 80 `pidof full_load` > > When run on x86_64 defconfig, what happens is as follows: > > -task runs on core1 for 95% of an rt_period as documented in > the file Documentation/scheduler/sched-rt-group.txt > > -at 95%, the code in balance_runtime sees this threshold and > calls do_balance_runtime() > > -do_balance_runtime sees that core 1 is in need, and does this: > --------------- > if (rt_rq->rt_runtime + diff > rt_period) > diff = rt_period - rt_rq->rt_runtime; > iter->rt_runtime -= diff; > rt_rq->rt_runtime += diff; > --------------- > which extends core1's rt_runtime by 5%, making it 100% of rt_period > by stealing 5% from core0 (or possibly some other core). > > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(), > we hit this near the top of that function: > --------------- > if (runtime >= sched_rt_period(rt_rq)) > return 0; > --------------- > and hence we'll _never_ look at/set any of the throttling checks and > messages in sched_rt_runtime_exceeded(). Instead, we will happily > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point > the RCU subsystem will get angry and trigger an NMI in response to > what it rightly sees as a WTF situation. In theory, one way of making RCU OK with an RT usermode CPU hog is to build with Frederic's CONFIG_NO_HZ_FULL=y. This will cause RCU to see CPUs having a single runnable usermode task as idle, preventing the RCU CPU stall warning. This does work well for mainline kernel in the lab. In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received for -rt kernels in production environments. But leaving practice aside for the moment... > Granted, there are lots of ways you can do bad things to yourself with > RT, but in the current zeitgeist of multicore systems with people > dedicating individual cores to individual tasks, I'd say the above is > common enough that we should react to it sensibly, and an RCU stall > really doesn't translate well to an end user vs a simple message that > says "throttling activated". > > One way to get the throttle message instead of the ambiguous and lengthy > NMI triggered all core backtrace of the RCU stall is to change the > SCHED_FEAT(RT_RUNTIME_SHARE, true) to false. One could make a good > case for this being the default for the out-of-tree preempt-rt series, > since folks using that are more apt to be manually tuning the system > and won't want an invisible hand coming in and making changes. > > However, in mainline, where it is more likely that there will be > n+x (x>0) RT tasks on an n core system, we can leave the sharing on, > and still avoid the RCU stalls by noting that there is no point in > trying to balance when there are no tasks to migrate, or only a > single RT task is present. Inflating the rt_runtime does nothing > in this case other than defeat sched_rt_runtime_exceeded(). > > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> > --- > > [I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338 > and tglx asked why they wouldn't see the throttle message; it is only > now that I had a chance to dig in and figure out why. Oh, and the patch > is against linux-next, in case that matters...] > > kernel/sched/rt.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index ea4d500..698aac9 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq) > if (!sched_feat(RT_RUNTIME_SHARE)) > return more; > > + /* > + * Stealing from another core won't help us at all if > + * we have nothing to migrate over there, or only one > + * task that is running up all the rt_time. In fact it > + * will just inhibit the throttling message in that case. > + */ > + if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) How about something like the following to take NO_HZ_FULL into account? + if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) && + !tick_nohz_full_cpu(cpu)) Thanx, Paul > + return more; > + > if (rt_rq->rt_time > rt_rq->rt_runtime) { > raw_spin_unlock(&rt_rq->rt_runtime_lock); > more = do_balance_runtime(rt_rq); > -- > 1.8.2.3 > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 15:44 ` Paul E. McKenney @ 2014-05-14 19:11 ` Paul Gortmaker 2014-05-14 19:27 ` Paul E. McKenney ` (2 more replies) 2014-05-15 3:18 ` Mike Galbraith 1 sibling, 3 replies; 31+ messages in thread From: Paul Gortmaker @ 2014-05-14 19:11 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner [Added Frederic to Cc: since we are now talking nohz stuff] [Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote: > On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: > > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 > > ("sched: rt-group: smp balancing") the concept of borrowing per > > cpu rt_runtime from one core to another was introduced. > > > > However, this prevents the RT throttling message from ever being > > emitted when someone does a common (but mistaken) attempt at > > using too much CPU in RT context. Consider the following test: > > > > echo "main() {for(;;);}" > full_load.c > > gcc full_load.c -o full_load > > taskset -c 1 ./full_load & > > chrt -r -p 80 `pidof full_load` > > > > When run on x86_64 defconfig, what happens is as follows: > > > > -task runs on core1 for 95% of an rt_period as documented in > > the file Documentation/scheduler/sched-rt-group.txt > > > > -at 95%, the code in balance_runtime sees this threshold and > > calls do_balance_runtime() > > > > -do_balance_runtime sees that core 1 is in need, and does this: > > --------------- > > if (rt_rq->rt_runtime + diff > rt_period) > > diff = rt_period - rt_rq->rt_runtime; > > iter->rt_runtime -= diff; > > rt_rq->rt_runtime += diff; > > --------------- > > which extends core1's rt_runtime by 5%, making it 100% of rt_period > > by stealing 5% from core0 (or possibly some other core). > > > > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(), > > we hit this near the top of that function: > > --------------- > > if (runtime >= sched_rt_period(rt_rq)) > > return 0; > > --------------- > > and hence we'll _never_ look at/set any of the throttling checks and > > messages in sched_rt_runtime_exceeded(). Instead, we will happily > > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point > > the RCU subsystem will get angry and trigger an NMI in response to > > what it rightly sees as a WTF situation. > > In theory, one way of making RCU OK with an RT usermode CPU hog is to > build with Frederic's CONFIG_NO_HZ_FULL=y. This will cause RCU to see > CPUs having a single runnable usermode task as idle, preventing the RCU > CPU stall warning. This does work well for mainline kernel in the lab. Agreed; wanting to test that locally for myself meant moving to a more modern machine, as the older PentiumD doesn't support NO_HZ_FULL. But on the newer box (dual socket six cores in each) I found the stall harder to trigger w/o going back to using the threadirqs boot arg as used in the earlier lkml post referenced below. (Why? Not sure...) Once I did that though (boot vanilla linux-next with threadirqs) I confirmed what you said; i.e. that we would reliably get a stall with the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also RCU_USER_QS=y). > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > for -rt kernels in production environments. > > But leaving practice aside for the moment... > [...] > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > > index ea4d500..698aac9 100644 > > --- a/kernel/sched/rt.c > > +++ b/kernel/sched/rt.c > > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq) > > if (!sched_feat(RT_RUNTIME_SHARE)) > > return more; > > > > + /* > > + * Stealing from another core won't help us at all if > > + * we have nothing to migrate over there, or only one > > + * task that is running up all the rt_time. In fact it > > + * will just inhibit the throttling message in that case. > > + */ > > + if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) > > How about something like the following to take NO_HZ_FULL into account? > > + if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) && > + !tick_nohz_full_cpu(cpu)) Yes, I think special casing nohz_full can make sense, but maybe not exactly here in balance_runtime? Since the underlying reasoning doesn't change on nohz_full ; if only one task is present, or nothing can migrate, then the call to do_balance_runtime is largely useless - we'll walk possibly all cpus in search of an rt_rq to steal from, and what we steal, we can't use - so we've artificially crippled the other rt_rq for nothing other than to artifically inflate our rt_runtime and thus allow 100% usage. Given that, perhaps a separate change to sched_rt_runtime_exceeded() that works out the CPU from the rt_rq, and returns zero if it is a nohz_full cpu? Does that make sense? Then the nohz_full people won't get the throttling message even if they go 100%. Paul. -- > > Thanx, Paul > > > + return more; > > + > > if (rt_rq->rt_time > rt_rq->rt_runtime) { > > raw_spin_unlock(&rt_rq->rt_runtime_lock); > > more = do_balance_runtime(rt_rq); > > -- > > 1.8.2.3 > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 19:11 ` Paul Gortmaker @ 2014-05-14 19:27 ` Paul E. McKenney 2014-05-15 2:49 ` Mike Galbraith 2014-11-27 11:36 ` Wanpeng Li 2 siblings, 0 replies; 31+ messages in thread From: Paul E. McKenney @ 2014-05-14 19:27 UTC (permalink / raw) To: Paul Gortmaker Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Wed, May 14, 2014 at 03:11:00PM -0400, Paul Gortmaker wrote: > [Added Frederic to Cc: since we are now talking nohz stuff] > > [Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote: > > > On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: > > > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 > > > ("sched: rt-group: smp balancing") the concept of borrowing per > > > cpu rt_runtime from one core to another was introduced. > > > > > > However, this prevents the RT throttling message from ever being > > > emitted when someone does a common (but mistaken) attempt at > > > using too much CPU in RT context. Consider the following test: > > > > > > echo "main() {for(;;);}" > full_load.c > > > gcc full_load.c -o full_load > > > taskset -c 1 ./full_load & > > > chrt -r -p 80 `pidof full_load` > > > > > > When run on x86_64 defconfig, what happens is as follows: > > > > > > -task runs on core1 for 95% of an rt_period as documented in > > > the file Documentation/scheduler/sched-rt-group.txt > > > > > > -at 95%, the code in balance_runtime sees this threshold and > > > calls do_balance_runtime() > > > > > > -do_balance_runtime sees that core 1 is in need, and does this: > > > --------------- > > > if (rt_rq->rt_runtime + diff > rt_period) > > > diff = rt_period - rt_rq->rt_runtime; > > > iter->rt_runtime -= diff; > > > rt_rq->rt_runtime += diff; > > > --------------- > > > which extends core1's rt_runtime by 5%, making it 100% of rt_period > > > by stealing 5% from core0 (or possibly some other core). > > > > > > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(), > > > we hit this near the top of that function: > > > --------------- > > > if (runtime >= sched_rt_period(rt_rq)) > > > return 0; > > > --------------- > > > and hence we'll _never_ look at/set any of the throttling checks and > > > messages in sched_rt_runtime_exceeded(). Instead, we will happily > > > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point > > > the RCU subsystem will get angry and trigger an NMI in response to > > > what it rightly sees as a WTF situation. > > > > In theory, one way of making RCU OK with an RT usermode CPU hog is to > > build with Frederic's CONFIG_NO_HZ_FULL=y. This will cause RCU to see > > CPUs having a single runnable usermode task as idle, preventing the RCU > > CPU stall warning. This does work well for mainline kernel in the lab. > > Agreed; wanting to test that locally for myself meant moving to a more > modern machine, as the older PentiumD doesn't support NO_HZ_FULL. But > on the newer box (dual socket six cores in each) I found the stall > harder to trigger w/o going back to using the threadirqs boot arg as > used in the earlier lkml post referenced below. (Why? Not sure...) > > Once I did that though (boot vanilla linux-next with threadirqs) I > confirmed what you said; i.e. that we would reliably get a stall with > the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also > RCU_USER_QS=y). Nice!!! Thank you for checking this out! > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > for -rt kernels in production environments. > > > > But leaving practice aside for the moment... > > > > [...] > > > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > > > index ea4d500..698aac9 100644 > > > --- a/kernel/sched/rt.c > > > +++ b/kernel/sched/rt.c > > > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq) > > > if (!sched_feat(RT_RUNTIME_SHARE)) > > > return more; > > > > > > + /* > > > + * Stealing from another core won't help us at all if > > > + * we have nothing to migrate over there, or only one > > > + * task that is running up all the rt_time. In fact it > > > + * will just inhibit the throttling message in that case. > > > + */ > > > + if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) > > > > How about something like the following to take NO_HZ_FULL into account? > > > > + if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) && > > + !tick_nohz_full_cpu(cpu)) > > Yes, I think special casing nohz_full can make sense, but maybe not > exactly here in balance_runtime? Since the underlying reasoning doesn't > change on nohz_full ; if only one task is present, or nothing can > migrate, then the call to do_balance_runtime is largely useless - we'll > walk possibly all cpus in search of an rt_rq to steal from, and what we > steal, we can't use - so we've artificially crippled the other rt_rq for > nothing other than to artifically inflate our rt_runtime and thus allow > 100% usage. > > Given that, perhaps a separate change to sched_rt_runtime_exceeded() > that works out the CPU from the rt_rq, and returns zero if it is a > nohz_full cpu? Does that make sense? Then the nohz_full people won't > get the throttling message even if they go 100%. Makes sense to me! Then again, I am no scheduler expert. Thanx, Paul > Paul. > -- > > > > > Thanx, Paul > > > > > + return more; > > > + > > > if (rt_rq->rt_time > rt_rq->rt_runtime) { > > > raw_spin_unlock(&rt_rq->rt_runtime_lock); > > > more = do_balance_runtime(rt_rq); > > > -- > > > 1.8.2.3 > > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 19:11 ` Paul Gortmaker 2014-05-14 19:27 ` Paul E. McKenney @ 2014-05-15 2:49 ` Mike Galbraith 2014-05-15 14:09 ` Paul Gortmaker 2014-11-27 9:17 ` Wanpeng Li 2014-11-27 11:36 ` Wanpeng Li 2 siblings, 2 replies; 31+ messages in thread From: Mike Galbraith @ 2014-05-15 2:49 UTC (permalink / raw) To: Paul Gortmaker Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote: > Given that, perhaps a separate change to sched_rt_runtime_exceeded() > that works out the CPU from the rt_rq, and returns zero if it is a > nohz_full cpu? Does that make sense? Then the nohz_full people won't > get the throttling message even if they go 100%. I don't get it. What reason would there be to run a hog on a dedicated core as realtime policy/priority? Given no competition, there's nothing to prioritize, you could just as well run a critical task as SCHED_IDLE. I would also expect that anyone wanting bare metal will have all of their critical cores isolated from the scheduler, watchdogs turned off as well as that noisy throttle, the whole point being to make as much silent as possible. Seems to me tick_nohz_full_cpu(cpu) should be predicated by that cpu being isolated from the #1 noise source, the scheduler and its load balancing. There's just no point to nohz_full without that, or if there is, I sure don't see it. When I see people trying to run a hog as a realtime task, it's because they are trying in vain to keep competition away from precious cores.. and one mlockall with a realtime hog blocking flush_work() gives them a wakeup call. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 2:49 ` Mike Galbraith @ 2014-05-15 14:09 ` Paul Gortmaker 2014-11-27 9:17 ` Wanpeng Li 1 sibling, 0 replies; 31+ messages in thread From: Paul Gortmaker @ 2014-05-15 14:09 UTC (permalink / raw) To: Mike Galbraith Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On 14-05-14 10:49 PM, Mike Galbraith wrote: > On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote: > >> Given that, perhaps a separate change to sched_rt_runtime_exceeded() >> that works out the CPU from the rt_rq, and returns zero if it is a >> nohz_full cpu? Does that make sense? Then the nohz_full people won't >> get the throttling message even if they go 100%. > > I don't get it. What reason would there be to run a hog on a dedicated > core as realtime policy/priority? Given no competition, there's nothing > to prioritize, you could just as well run a critical task as SCHED_IDLE. Well, as per the original commit log, we acknowledge that people will do stupid things that don't make 100% sense, and when they do, we should ideally behave in a sane fashion in response to that. And I don't think that "no competition" is a given for most folks. They see all these internal threads running and just figure they can chrt their way to a solution, vs. taking the time to clean up, enable RCU_NOCB etc etc. Don't get me wrong; I'm not defending such behaviour... > > I would also expect that anyone wanting bare metal will have all of > their critical cores isolated from the scheduler, watchdogs turned off > as well as that noisy throttle, the whole point being to make as much > silent as possible. Seems to me tick_nohz_full_cpu(cpu) should be > predicated by that cpu being isolated from the #1 noise source, the > scheduler and its load balancing. There's just no point to nohz_full > without that, or if there is, I sure don't see it. An interesting point. One could argue that the default for the nohz_full cores should be to be isolated from the scheduler, vs needing to be manually excluded. P. -- > > When I see people trying to run a hog as a realtime task, it's because > they are trying in vain to keep competition away from precious cores.. > and one mlockall with a realtime hog blocking flush_work() gives them a > wakeup call. > > -Mike > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 2:49 ` Mike Galbraith 2014-05-15 14:09 ` Paul Gortmaker @ 2014-11-27 9:17 ` Wanpeng Li 2014-11-27 15:31 ` Mike Galbraith 1 sibling, 1 reply; 31+ messages in thread From: Wanpeng Li @ 2014-11-27 9:17 UTC (permalink / raw) To: Mike Galbraith, Paul Gortmaker Cc: Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner Hi Mike, On 5/15/14, 10:49 AM, Mike Galbraith wrote: > On Wed, 2014-05-14 at 15:11 -0400, Paul Gortmaker wrote: > >> Given that, perhaps a separate change to sched_rt_runtime_exceeded() >> that works out the CPU from the rt_rq, and returns zero if it is a >> nohz_full cpu? Does that make sense? Then the nohz_full people won't >> get the throttling message even if they go 100%. > I don't get it. What reason would there be to run a hog on a dedicated > core as realtime policy/priority? Given no competition, there's nothing > to prioritize, you could just as well run a critical task as SCHED_IDLE. > > I would also expect that anyone wanting bare metal will have all of > their critical cores isolated from the scheduler, watchdogs turned off > as well as that noisy throttle, the whole point being to make as much > silent as possible. Seems to me tick_nohz_full_cpu(cpu) should be > predicated by that cpu being isolated from the #1 noise source, the > scheduler and its load balancing. There's just no point to nohz_full > without that, or if there is, I sure don't see it. If the tick is still need to be handled if cpu is isolated w/o nohz full enabled? Regards, Wanpeng Li > > When I see people trying to run a hog as a realtime task, it's because > they are trying in vain to keep competition away from precious cores.. > and one mlockall with a realtime hog blocking flush_work() gives them a > wakeup call. > > -Mike > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-11-27 9:17 ` Wanpeng Li @ 2014-11-27 15:31 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2014-11-27 15:31 UTC (permalink / raw) To: Wanpeng Li Cc: Paul Gortmaker, Paul E. McKenney, linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Thu, 2014-11-27 at 17:17 +0800, Wanpeng Li wrote: > If the tick is still need to be handled if cpu is isolated w/o nohz full > enabled? "No" would lead to some awkward questions. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 19:11 ` Paul Gortmaker 2014-05-14 19:27 ` Paul E. McKenney 2014-05-15 2:49 ` Mike Galbraith @ 2014-11-27 11:36 ` Wanpeng Li 2 siblings, 0 replies; 31+ messages in thread From: Wanpeng Li @ 2014-11-27 11:36 UTC (permalink / raw) To: Paul Gortmaker, Paul E. McKenney Cc: linux-kernel, linux-rt-users, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner Hi Paul, On 5/15/14, 3:11 AM, Paul Gortmaker wrote: > [Added Frederic to Cc: since we are now talking nohz stuff] > > [Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile] On 14/05/2014 (Wed 08:44) Paul E. McKenney wrote: > >> On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: >>> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 >>> ("sched: rt-group: smp balancing") the concept of borrowing per >>> cpu rt_runtime from one core to another was introduced. >>> >>> However, this prevents the RT throttling message from ever being >>> emitted when someone does a common (but mistaken) attempt at >>> using too much CPU in RT context. Consider the following test: >>> >>> echo "main() {for(;;);}" > full_load.c >>> gcc full_load.c -o full_load >>> taskset -c 1 ./full_load & >>> chrt -r -p 80 `pidof full_load` >>> >>> When run on x86_64 defconfig, what happens is as follows: >>> >>> -task runs on core1 for 95% of an rt_period as documented in >>> the file Documentation/scheduler/sched-rt-group.txt >>> >>> -at 95%, the code in balance_runtime sees this threshold and >>> calls do_balance_runtime() >>> >>> -do_balance_runtime sees that core 1 is in need, and does this: >>> --------------- >>> if (rt_rq->rt_runtime + diff > rt_period) >>> diff = rt_period - rt_rq->rt_runtime; >>> iter->rt_runtime -= diff; >>> rt_rq->rt_runtime += diff; >>> --------------- >>> which extends core1's rt_runtime by 5%, making it 100% of rt_period >>> by stealing 5% from core0 (or possibly some other core). >>> >>> However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(), >>> we hit this near the top of that function: >>> --------------- >>> if (runtime >= sched_rt_period(rt_rq)) >>> return 0; >>> --------------- >>> and hence we'll _never_ look at/set any of the throttling checks and >>> messages in sched_rt_runtime_exceeded(). Instead, we will happily >>> plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point >>> the RCU subsystem will get angry and trigger an NMI in response to >>> what it rightly sees as a WTF situation. >> In theory, one way of making RCU OK with an RT usermode CPU hog is to >> build with Frederic's CONFIG_NO_HZ_FULL=y. This will cause RCU to see >> CPUs having a single runnable usermode task as idle, preventing the RCU >> CPU stall warning. This does work well for mainline kernel in the lab. > Agreed; wanting to test that locally for myself meant moving to a more > modern machine, as the older PentiumD doesn't support NO_HZ_FULL. But Could you point out which hw feature support NO_HZ_FULL? How to check it through cpuid? Regards, Wanpeng Li > on the newer box (dual socket six cores in each) I found the stall > harder to trigger w/o going back to using the threadirqs boot arg as > used in the earlier lkml post referenced below. (Why? Not sure...) > > Once I did that though (boot vanilla linux-next with threadirqs) I > confirmed what you said; i.e. that we would reliably get a stall with > the defconfig of NOHZ_IDLE=y but not with NOHZ_FULL=y (and hence also > RCU_USER_QS=y). > >> In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received >> for -rt kernels in production environments. >> >> But leaving practice aside for the moment... >> > [...] > >>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c >>> index ea4d500..698aac9 100644 >>> --- a/kernel/sched/rt.c >>> +++ b/kernel/sched/rt.c >>> @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq) >>> if (!sched_feat(RT_RUNTIME_SHARE)) >>> return more; >>> >>> + /* >>> + * Stealing from another core won't help us at all if >>> + * we have nothing to migrate over there, or only one >>> + * task that is running up all the rt_time. In fact it >>> + * will just inhibit the throttling message in that case. >>> + */ >>> + if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) >> How about something like the following to take NO_HZ_FULL into account? >> >> + if ((!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) && >> + !tick_nohz_full_cpu(cpu)) > Yes, I think special casing nohz_full can make sense, but maybe not > exactly here in balance_runtime? Since the underlying reasoning doesn't > change on nohz_full ; if only one task is present, or nothing can > migrate, then the call to do_balance_runtime is largely useless - we'll > walk possibly all cpus in search of an rt_rq to steal from, and what we > steal, we can't use - so we've artificially crippled the other rt_rq for > nothing other than to artifically inflate our rt_runtime and thus allow > 100% usage. > > Given that, perhaps a separate change to sched_rt_runtime_exceeded() > that works out the CPU from the rt_rq, and returns zero if it is a > nohz_full cpu? Does that make sense? Then the nohz_full people won't > get the throttling message even if they go 100%. > > Paul. > -- > >> Thanx, Paul >> >>> + return more; >>> + >>> if (rt_rq->rt_time > rt_rq->rt_runtime) { >>> raw_spin_unlock(&rt_rq->rt_runtime_lock); >>> more = do_balance_runtime(rt_rq); >>> -- >>> 1.8.2.3 >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 15:44 ` Paul E. McKenney 2014-05-14 19:11 ` Paul Gortmaker @ 2014-05-15 3:18 ` Mike Galbraith 2014-05-15 14:45 ` Paul E. McKenney ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Mike Galbraith @ 2014-05-15 3:18 UTC (permalink / raw) To: paulmck Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > for -rt kernels in production environments. I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at all with 60 cores isolated. I didn't have time to rummage, but it looks like there are still bugs to squash. Biggest problem with CONFIG_NO_HZ_FULL is the price tag. It just raped fast mover performance last time I measured. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 3:18 ` Mike Galbraith @ 2014-05-15 14:45 ` Paul E. McKenney 2014-05-15 17:27 ` Mike Galbraith 2014-05-18 4:22 ` Mike Galbraith 2014-05-19 10:54 ` Peter Zijlstra 2 siblings, 1 reply; 31+ messages in thread From: Paul E. McKenney @ 2014-05-15 14:45 UTC (permalink / raw) To: Mike Galbraith Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote: > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > for -rt kernels in production environments. > > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at > all with 60 cores isolated. I didn't have time to rummage, but it looks > like there are still bugs to squash. > > Biggest problem with CONFIG_NO_HZ_FULL is the price tag. It just raped > fast mover performance last time I measured. I do have a report of the RCU grace-period kthreads (rcu_preempt, rcu_sched, and rcu_bh) consuming excessive CPU time on large boxes, but this is for workloads with lots of threads and context switches. Whether relevant or not to your situation, working on it... Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 14:45 ` Paul E. McKenney @ 2014-05-15 17:27 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2014-05-15 17:27 UTC (permalink / raw) To: paulmck Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Thu, 2014-05-15 at 07:45 -0700, Paul E. McKenney wrote: > On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote: > > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > > > > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > > for -rt kernels in production environments. > > > > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at > > all with 60 cores isolated. I didn't have time to rummage, but it looks > > like there are still bugs to squash. > > > > Biggest problem with CONFIG_NO_HZ_FULL is the price tag. It just raped > > fast mover performance last time I measured. > > I do have a report of the RCU grace-period kthreads (rcu_preempt, > rcu_sched, and rcu_bh) consuming excessive CPU time on large boxes, > but this is for workloads with lots of threads and context switches. > > Whether relevant or not to your situation, working on it... RCU signal was swamped by accounting. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 3:18 ` Mike Galbraith 2014-05-15 14:45 ` Paul E. McKenney @ 2014-05-18 4:22 ` Mike Galbraith 2014-05-18 5:20 ` Paul E. McKenney 2014-05-19 10:54 ` Peter Zijlstra 2 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2014-05-18 4:22 UTC (permalink / raw) To: paulmck Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Thu, 2014-05-15 at 05:18 +0200, Mike Galbraith wrote: > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > for -rt kernels in production environments. > > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at > all with 60 cores isolated. I didn't have time to rummage, but it looks > like there are still bugs to squash. I tested a bit more yesterday. With only NO_HZ_FULL (no all), it did go tickless. A 10 second sample of perturbation numbers below. Pretty noisy, but it does work in rt. Below that, some jitter numbers, using a simplified and imperfect model of a high end rt video game (game over, insert 1 gold bar to continue variety of high end) executive synchronizing a simple load on 60 cores. Bottom line there was if user thinks booting nohz_full=set to get any core quiescence provided by that should improve jitter for a threaded load despite not being able to shut tick down, he's wrong. -Mike vogelweide:/abuild/mike/:[0]# head -180 xx|tail -60 pert/s: 500 >14.10us: 2 min: 1.90 max: 96.86 avg: 5.17 sum/s: 2586us overhead: 0.26% pert/s: 1 >18.63us: 0 min: 3.38 max: 8.37 avg: 4.71 sum/s: 5us overhead: 0.00% pert/s: 1 >18.54us: 0 min: 3.41 max: 8.51 avg: 4.61 sum/s: 5us overhead: 0.00% pert/s: 1 >18.78us: 0 min: 3.45 max: 8.67 avg: 4.41 sum/s: 4us overhead: 0.00% pert/s: 1 >19.60us: 0 min: 2.68 max: 7.47 avg: 4.30 sum/s: 4us overhead: 0.00% pert/s: 1 >20.61us: 0 min: 2.60 max: 7.54 avg: 3.94 sum/s: 4us overhead: 0.00% pert/s: 1 >21.03us: 0 min: 2.70 max: 7.81 avg: 3.91 sum/s: 4us overhead: 0.00% pert/s: 1 >19.89us: 0 min: 2.65 max: 7.83 avg: 3.98 sum/s: 4us overhead: 0.00% pert/s: 1 >29.36us: 0 min: 3.78 max: 9.82 avg: 6.10 sum/s: 6us overhead: 0.00% pert/s: 1 >28.86us: 0 min: 4.56 max: 10.12 avg: 6.36 sum/s: 6us overhead: 0.00% pert/s: 1 >21.34us: 0 min: 2.54 max: 7.79 avg: 3.38 sum/s: 3us overhead: 0.00% pert/s: 1 >30.12us: 0 min: 3.51 max: 8.41 avg: 4.47 sum/s: 4us overhead: 0.00% pert/s: 1 >21.01us: 0 min: 2.44 max: 7.36 avg: 3.33 sum/s: 3us overhead: 0.00% pert/s: 1 >22.41us: 0 min: 2.42 max: 7.71 avg: 3.60 sum/s: 4us overhead: 0.00% pert/s: 1 >30.37us: 0 min: 3.46 max: 8.62 avg: 4.49 sum/s: 4us overhead: 0.00% pert/s: 1 >29.50us: 0 min: 3.43 max: 9.23 avg: 4.13 sum/s: 4us overhead: 0.00% pert/s: 1 >20.66us: 0 min: 2.49 max: 7.73 avg: 4.24 sum/s: 4us overhead: 0.00% pert/s: 1 >33.87us: 0 min: 4.45 max: 9.59 avg: 5.63 sum/s: 6us overhead: 0.00% pert/s: 1 >34.70us: 0 min: 4.47 max: 10.15 avg: 6.41 sum/s: 6us overhead: 0.00% pert/s: 1 >29.62us: 0 min: 4.49 max: 9.87 avg: 5.69 sum/s: 6us overhead: 0.00% pert/s: 1 >36.92us: 0 min: 3.53 max: 9.41 avg: 4.48 sum/s: 4us overhead: 0.00% pert/s: 1 >35.31us: 0 min: 3.69 max: 9.00 avg: 5.30 sum/s: 5us overhead: 0.00% pert/s: 1 >36.29us: 0 min: 3.34 max: 8.48 avg: 4.48 sum/s: 4us overhead: 0.00% pert/s: 1 >34.90us: 0 min: 3.39 max: 9.21 avg: 4.45 sum/s: 4us overhead: 0.00% pert/s: 1 >34.23us: 0 min: 3.37 max: 8.44 avg: 4.54 sum/s: 5us overhead: 0.00% pert/s: 1 >34.45us: 0 min: 0.05 max: 9.41 avg: 4.40 sum/s: 5us overhead: 0.00% pert/s: 1 >35.31us: 0 min: 3.89 max: 9.18 avg: 4.30 sum/s: 5us overhead: 0.00% pert/s: 1 >35.98us: 0 min: 2.80 max: 9.28 avg: 4.74 sum/s: 5us overhead: 0.00% pert/s: 1 >33.89us: 0 min: 3.15 max: 9.67 avg: 5.07 sum/s: 5us overhead: 0.00% pert/s: 1 >35.16us: 0 min: 2.56 max: 9.40 avg: 4.84 sum/s: 5us overhead: 0.00% pert/s: 1 >36.37us: 0 min: 4.49 max: 9.48 avg: 6.12 sum/s: 6us overhead: 0.00% pert/s: 1 >38.10us: 0 min: 0.04 max: 34.86 avg: 6.62 sum/s: 13us overhead: 0.00% pert/s: 1 >35.11us: 0 min: 5.05 max: 11.56 avg: 5.88 sum/s: 6us overhead: 0.00% pert/s: 1 >36.88us: 0 min: 3.77 max: 12.37 avg: 6.13 sum/s: 6us overhead: 0.00% pert/s: 1 >34.37us: 1 min: 2.08 max:199.64 avg: 20.67 sum/s: 25us overhead: 0.00% pert/s: 1 >35.57us: 1 min: 2.11 max:198.61 avg: 19.17 sum/s: 25us overhead: 0.00% pert/s: 1 >33.89us: 1 min: 2.46 max:199.49 avg: 19.85 sum/s: 26us overhead: 0.00% pert/s: 1 >37.58us: 1 min: 2.34 max:199.79 avg: 19.59 sum/s: 25us overhead: 0.00% pert/s: 1 >34.57us: 0 min: 3.43 max: 13.37 avg: 5.86 sum/s: 6us overhead: 0.00% pert/s: 1 >21.10us: 1 min: 2.42 max:199.97 avg: 20.08 sum/s: 26us overhead: 0.00% pert/s: 1 >20.86us: 1 min: 2.23 max:194.83 avg: 19.69 sum/s: 26us overhead: 0.00% pert/s: 1 >22.47us: 1 min: 2.15 max:197.13 avg: 19.61 sum/s: 25us overhead: 0.00% pert/s: 1 >21.42us: 1 min: 2.24 max:198.75 avg: 19.70 sum/s: 26us overhead: 0.00% pert/s: 1 >34.85us: 0 min: 0.05 max: 10.83 avg: 3.80 sum/s: 5us overhead: 0.00% pert/s: 1 >33.72us: 0 min: 4.34 max: 11.78 avg: 6.04 sum/s: 6us overhead: 0.00% pert/s: 1 >21.49us: 2 min: 2.13 max:200.35 avg: 20.22 sum/s: 26us overhead: 0.00% pert/s: 1 >22.52us: 2 min: 2.32 max:197.07 avg: 21.02 sum/s: 27us overhead: 0.00% pert/s: 1 >22.35us: 2 min: 2.16 max:197.04 avg: 20.59 sum/s: 27us overhead: 0.00% pert/s: 1 >35.38us: 0 min: 3.20 max: 10.42 avg: 4.56 sum/s: 5us overhead: 0.00% pert/s: 306 >17.41us: 1 min: 1.31 max: 51.95 avg: 6.83 sum/s: 2091us overhead: 0.21% pert/s: 1 >99.81us: 1 min: 2.11 max:196.91 avg: 20.61 sum/s: 27us overhead: 0.00% pert/s: 1 >21.45us: 2 min: 2.31 max:196.62 avg: 20.49 sum/s: 27us overhead: 0.00% pert/s: 1 >97.14us: 1 min: 2.22 max:195.97 avg: 21.29 sum/s: 28us overhead: 0.00% pert/s: 1 >21.94us: 2 min: 2.25 max:199.98 avg: 20.14 sum/s: 26us overhead: 0.00% pert/s: 206 >75.07us: 1 min: 1.60 max:116.17 avg: 5.83 sum/s: 1202us overhead: 0.12% pert/s: 1 >96.39us: 1 min: 2.26 max:194.60 avg: 21.29 sum/s: 28us overhead: 0.00% pert/s: 1 >94.72us: 1 min: 2.20 max:193.63 avg: 21.32 sum/s: 28us overhead: 0.00% pert/s: 1 >97.23us: 1 min: 2.08 max:198.18 avg: 21.27 sum/s: 28us overhead: 0.00% pert/s: 1 >88.44us: 0 min: 2.28 max: 11.33 avg: 7.05 sum/s: 8us overhead: 0.00% pert/s: 1 >89.22us: 0 min: 3.59 max: 15.86 avg: 7.81 sum/s: 8us overhead: 0.00% model is not picky, calls frame jitter >30us a 'Flier', counts them, tags a few. 3.14.4-rt5 virgin source nohz_full=4-63 FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 1727998 0.0159 184.04 (1170916)0.7079 0.7321 0 (0) 16 (955515,955516,986596,986597,1017316,..1561069) 5 1727998 0.0159 186.94 (1171397)0.4114 0.6508 0 (0) 16 (956356,956357,987076,987077,1017796,..1171397) 6 1727999 0.0159 36.73 (595340) 0.8620 0.8818 0 (0) 11 (86411,86412,91211,91212,96011,..1209942) 7 1727999 0.0159 189.53 (1141636)0.4791 0.6720 0 (0) 17 (895876,926596,926597,957316,957317,..1141637) 8 1728000 0.0159 184.07 (988517) 0.3885 0.6788 0 (0) 16 (773476,773477,804196,804197,834916,..988517) 9 1728000 0.0159 180.74 (1050437)0.3514 0.6649 0 (0) 16 (835396,835397,866116,866117,896836,..1050437) 10 1728000 0.0159 188.84 (1020197)0.4211 0.6945 0 (0) 16 (805156,805157,835876,835877,866596,..1020197) 11 1728000 0.0159 180.98 (959237) 0.3867 0.6802 0 (0) 16 (744196,744197,774916,774917,805636,..959237) 12 1728000 0.0159 176.41 (898276) 0.6384 0.6972 0 (0) 16 (683236,683237,713956,713957,744676,..898277) 13 1728000 0.0159 188.84 (837317) 0.7538 0.8263 0 (0) 16 (622276,622277,652996,652997,683716,..837317) 14 1728000 0.0159 178.83 (1022117)0.5803 0.6995 0 (0) 16 (807076,807077,837796,837797,868516,..1022117) 15 1728000 0.0159 187.17 (838277) 0.7163 0.8367 0 (0) 16 (623236,623237,653956,653957,684676,..838277) 16 1728000 0.0159 184.31 (992357) 0.6860 0.9137 0 (0) 16 (777316,777317,808036,808037,838756,..992357) 17 1728000 0.0159 190.75 (962117) 0.6607 0.9281 0 (0) 17 (716356,747076,747077,777796,777797,..962117) 18 1728000 0.0159 186.46 (870437) 0.6505 0.9303 0 (0) 16 (655396,655397,686116,686117,716836,..870437) 19 1728000 0.0159 187.62 (901636) 0.8962 0.9769 0 (0) 16 (686596,686597,717316,717317,748036,..901637) 20 1728000 0.0159 187.89 (748517) 1.0297 1.0907 0 (0) 16 (533476,533477,564196,564197,594916,..748517) 21 1728000 0.0159 177.84 (779716) 0.9255 1.0430 0 (0) 16 (564676,564677,595396,595397,626116,..779717) 22 1728000 0.0159 192.42 (780197) 0.6549 0.9060 0 (0) 18 (534436,534437,565156,565157,595876,..780197) 23 1728000 0.0159 179.99 (719236) 0.8476 1.0329 0 (0) 16 (504196,504197,534916,534917,565636,..719237) 24 1800000 0.0725 8.75 (1545420) 0.8685 0.6879 0 (0) 25 1800000 0.0725 20.19 (1550204) 0.8268 0.7200 0 (0) 26 1800000 0.0725 34.02 (14704) 0.6771 0.7340 0 (0) 104 (14704,14705,46704,46705,78704,..1774705) 27 1800000 0.0725 48.47 (1519205) 0.6290 0.6766 0 (0) 112 (15204,15205,47204,47205,79204,..1775205) 28 1800000 0.0725 65.64 (1711705) 0.6870 0.7524 0 (0) 112 (15704,15705,47704,47705,79704,..1775705) 29 1800000 0.0725 108.41 (1616204)0.4684 0.9344 0 (0) 112 (16204,16205,48204,48205,80204,..1776205) 30 1800000 0.0725 108.17 (1680704)0.8166 1.0311 0 (0) 112 (16704,16705,48704,48705,80704,..1776705) 31 1800000 0.0725 123.81 (49205) 0.6050 1.1110 0 (0) 112 (17204,17205,49204,49205,81204,..1777205) 32 1800000 0.0725 113.42 (17704) 0.6158 0.9614 0 (0) 112 (17704,17705,49704,49705,81704,..1777705) 33 1800000 0.0725 184.94 (1458204)1.0111 1.7618 0 (0) 112 (18204,18205,50204,50205,82204,..1778205) 34 1800000 0.0725 194.72 (1490704)1.1291 1.7317 0 (0) 98 (18704,18705,50704,50705,82704,..1778705) 35 1800000 0.0725 185.56 (339205) 0.5819 1.5599 0 (0) 112 (19204,19205,51204,51205,83204,..1779205) 36 1800000 0.0725 30.45 (227051) 0.9345 1.0711 0 (0) 1 (227051) 37 1800000 0.0725 184.61 (1780205)0.7439 1.5621 0 (0) 112 (20204,20205,52204,52205,84204,..1780205) 38 1800000 0.0725 28.30 (329851) 0.9923 0.8368 0 (0) 39 1800000 0.0725 26.15 (183951) 0.9777 0.8443 0 (0) 40 1800000 0.0725 6.75 (1136462) 0.9447 0.7415 0 (0) 41 1800000 0.0725 6.03 (1128961) 0.8416 0.6431 0 (0) 42 1800000 0.0725 6.51 (1557012) 0.8961 0.6698 0 (0) 43 1800000 0.0725 7.08 (1294060) 0.7015 0.6162 0 (0) 44 540000 0.0032 9.30 (470245) 0.7457 0.5968 0 (0) 45 540000 0.0032 17.88 (55184) 0.6024 0.9190 0 (0) 46 540000 0.0032 104.43 (535411) 1.0158 1.0948 0 (0) 50 (305010,305011,314610,314611,324210,..535411) 47 540000 0.0032 17.16 (261898) 0.8780 1.0232 0 (0) 48 540000 0.0032 17.65 (132511) 0.5251 0.4697 0 (0) 49 540000 0.0032 166.41 (535860) 0.4444 1.4015 0 (0) 84 (142260,142261,151860,151861,161460,..535861) 50 540000 0.0032 86.30 (132810) 0.6620 0.6966 0 (0) 14 (46410,46411,56010,56011,84810,..132811) 51 540000 0.0032 193.36 (372961) 0.7880 1.6782 0 (0) 62 (8160,8161,17760,17761,27360,..372961) 52 540000 0.0032 113.96 (133110) 0.5096 0.7058 0 (0) 14 (46710,46711,56310,56311,85110,..133111) 53 540000 0.0032 191.69 (325261) 0.9986 1.6550 0 (0) 54 (8460,8461,18060,18061,27660,..325261) 54 540000 0.0032 123.74 (133411) 0.5987 0.7777 0 (0) 14 (47010,47011,56610,56611,85410,..133411) 55 540000 0.0032 193.60 (287161) 0.6729 1.5368 0 (0) 46 (8760,8761,18360,18361,27960,..287161) 56 540000 0.0032 194.79 (133711) 0.8196 1.3677 0 (0) 12 (47310,47311,56910,56911,85710,..133711) 57 540000 0.0032 19.08 (410169) 0.7227 1.1486 0 (0) 58 540000 0.0032 192.16 (38010) 0.7522 1.3604 0 (0) 8 (9210,9211,18810,18811,28410,..38011) 59 540000 0.0032 18.12 (360437) 0.8875 1.1737 0 (0) 60 540000 0.0032 23.84 (316830) 1.0781 1.2241 0 (0) 61 540000 0.0032 27.90 (316815) 1.1200 1.2471 0 (0) 62 540000 0.0032 22.18 (316830) 0.9777 1.0933 0 (0) 63 540000 0.0032 26.94 (317310) 1.1227 1.2478 0 (0) Reference: 3.12.18-rt25-0.gf8a6df6-rt, nohz_idle, cpuset switches tick ON for rt set FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 1727993 0.0159 4.07 (2599) 0.1072 0.1890 0 (0) 5 1727993 0.0159 4.55 (807476) 0.0954 0.1669 0 (0) 6 1727993 0.0159 3.80 (38023) 0.1321 0.2110 0 (0) 7 1727994 0.0159 3.35 (1724219) 0.0898 0.1628 0 (0) 8 1727994 0.0159 3.80 (109400) 0.0957 0.1852 0 (0) 9 1727994 0.0159 3.83 (710923) 0.1001 0.1779 0 (0) 10 1727994 0.0159 3.35 (529312) 0.1009 0.1741 0 (0) 11 1727995 0.0159 3.83 (1372590) 0.0935 0.1740 0 (0) 12 1727995 0.0159 3.83 (51129) 0.0857 0.1724 0 (0) 13 1727995 0.0159 3.83 (1273109) 0.1028 0.1852 0 (0) 14 1727996 0.0159 3.59 (486904) 0.1005 0.1811 0 (0) 15 1727996 0.0159 3.35 (691340) 0.1589 0.1899 0 (0) 16 1727996 0.0159 4.07 (1638706) 0.1340 0.2526 0 (0) 17 1727997 0.0159 4.55 (913535) 0.1110 0.2050 0 (0) 18 1727997 0.0159 4.31 (1704012) 0.1193 0.2129 0 (0) 19 1727997 0.0159 5.23 (1273925) 0.1434 0.2372 0 (0) 20 1727997 0.0159 5.26 (16547) 0.1119 0.2259 0 (0) 21 1727998 0.0159 5.71 (341896) 0.1893 0.2458 0 (0) 22 1727998 0.0159 4.55 (1276554) 0.1005 0.1961 0 (0) 23 1727998 0.0159 5.71 (1029507) 0.2141 0.2460 0 (0) 24 1799998 0.0725 3.98 (1551231) 0.1059 0.0518 0 (0) 25 1799998 0.0725 2.79 (272233) 0.1192 0.0866 0 (0) 26 1799998 0.0725 3.03 (272233) 0.1817 0.1317 0 (0) 27 1799999 0.0725 3.03 (272233) 0.1426 0.1009 0 (0) 28 1799999 0.0725 2.79 (402235) 0.2632 0.2574 0 (0) 29 1799999 0.0725 2.46 (387055) 0.1109 0.0709 0 (0) 30 1799999 0.0725 3.03 (387054) 0.1301 0.0860 0 (0) 31 1800000 0.0725 4.60 (1329743) 0.3274 0.2551 0 (0) 32 1800000 0.0725 2.93 (867055) 0.1076 0.0535 0 (0) 33 1800000 0.0725 2.93 (867055) 0.1049 0.0524 0 (0) 34 1800000 0.0725 3.50 (1347054) 0.1132 0.0740 0 (0) 35 1800000 0.0725 3.27 (867054) 0.1076 0.0857 0 (0) 36 1800000 0.0725 3.27 (867054) 0.2015 0.1381 0 (0) 37 1800000 0.0725 3.74 (1347054) 0.1020 0.0461 0 (0) 38 1800000 0.0725 3.17 (867055) 0.1118 0.0752 0 (0) 39 1800000 0.0725 2.93 (1347055) 0.1092 0.0624 0 (0) 40 1800000 0.0725 2.55 (867054) 0.1126 0.0703 0 (0) 41 1800000 0.0725 2.93 (867055) 0.1092 0.0560 0 (0) 42 1800000 0.0725 3.65 (387055) 0.2079 0.1424 0 (0) 43 1800000 0.0725 6.51 (905345) 0.3940 0.3751 0 (0) 44 539999 0.0032 3.10 (260115) 0.3366 0.2650 0 (0) 45 539999 0.0032 2.62 (116115) 0.3365 0.2648 0 (0) 46 539999 0.0032 4.53 (17248) 0.0854 0.2177 0 (0) 47 539999 0.0032 4.06 (142) 0.0767 0.1120 0 (0) 48 539999 0.0032 3.10 (260116) 0.0604 0.1029 0 (0) 49 539999 0.0032 3.10 (404115) 0.0901 0.1160 0 (0) 50 539999 0.0032 3.10 (404116) 0.1026 0.1449 0 (0) 51 539999 0.0032 2.86 (260116) 0.1019 0.1571 0 (0) 52 539999 0.0032 3.10 (116115) 0.0776 0.1190 0 (0) 53 539999 0.0032 3.10 (260115) 0.0719 0.1121 0 (0) 54 539999 0.0032 3.10 (260115) 0.3323 0.2669 0 (0) 55 539999 0.0032 3.10 (260115) 0.3534 0.2873 0 (0) 56 539999 0.0032 4.53 (422169) 0.1143 0.2653 0 (0) 57 539999 0.0032 3.10 (260116) 0.1021 0.2007 0 (0) 58 539999 0.0032 3.10 (116116) 0.0996 0.2120 0 (0) 59 539999 0.0032 2.86 (116116) 0.0989 0.2017 0 (0) 60 539999 0.0032 2.86 (116115) 0.3619 0.2676 0 (0) 61 539999 0.0032 2.86 (116115) 0.3453 0.2593 0 (0) 62 539999 0.0032 2.86 (116116) 0.1029 0.2016 0 (0) 63 539999 0.0032 3.10 (116116) 0.0858 0.1893 0 (0) 3.14.4-rt5 + patches (hacks from reference) nohz_cpus=4-63 FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 1728000 0.0159 10.98 (1625029) 0.4708 0.6897 0 (0) 5 1728000 0.0159 11.94 (1612693) 0.6770 0.8055 0 (0) 6 1728000 0.0159 11.90 (1544891) 0.6773 0.6996 0 (0) 7 1728000 0.0159 13.10 (1357976) 0.5167 0.6947 0 (0) 8 1728000 0.0159 21.71 (1331577) 0.5297 0.8358 0 (0) 9 1728000 0.0159 21.68 (1331576) 0.6269 0.7978 0 (0) 10 1728000 0.0159 19.53 (889433) 0.8569 0.8690 0 (0) 11 1728000 0.0159 29.82 (1508316) 0.5446 0.7971 0 (0) 12 1728000 0.0159 24.81 (1502988) 0.5601 0.8393 0 (0) 13 1728000 0.0159 21.71 (1476396) 0.7357 0.8276 0 (0) 14 1728000 0.0159 12.86 (1178839) 0.5667 0.8542 0 (0) 15 1728000 0.0159 11.90 (1671304) 0.6289 0.6630 0 (0) 16 1728000 0.0159 23.35 (1493387) 0.7169 1.1261 0 (0) 17 1728000 0.0159 17.87 (1463627) 0.9685 1.3873 0 (0) 18 1728000 0.0159 16.91 (826801) 0.7671 1.0588 0 (0) 19 1728000 0.0159 19.06 (1238264) 0.6330 0.9972 0 (0) 20 1728000 0.0159 37.18 (270001) 1.3230 1.9480 0 (0) 15 (266593,266594,268369,268370,270001,..876386) 21 1728000 0.0159 40.52 (270769) 1.3299 1.6877 0 (0) 14 (266593,266594,268369,268370,270001,..876386) 22 1728000 0.0159 28.86 (273746) 1.3284 2.2402 0 (0) 23 1728000 0.0159 39.35 (270770) 1.2386 1.6255 0 (0) 13 (266593,266594,268369,268370,270001,..273746) 24 1800000 0.0725 36.41 (282051) 1.3319 1.7074 0 (0) 9 (279551,279552,281251,281252,282051,..285152) 25 1800000 0.0725 40.60 (281252) 1.5767 1.8895 0 (0) 14 (277701,277702,279551,279552,281251,..285152) 26 1800000 0.0725 44.42 (281252) 1.4205 1.7594 0 (0) 16 (277701,277702,279551,279552,281251,..926361) 27 1800000 0.0725 42.27 (281252) 1.3342 1.7801 0 (0) 14 (277701,277702,279551,279552,281251,..285152) 28 1800000 0.0725 43.08 (281251) 0.8064 1.2355 0 (0) 16 (277701,277702,279551,279552,281251,..290202) 29 1800000 0.0725 44.18 (281252) 0.9319 1.0451 0 (0) 14 (277701,277702,279551,279552,281251,..285152) 30 1800000 0.0725 41.56 (279552) 1.5054 1.7734 0 (0) 19 (277701,277702,279551,279552,281251,..1379844) 31 1800000 0.0725 43.23 (285152) 0.8076 0.9825 0 (0) 14 (277701,277702,279551,279552,281251,..285152) 32 1800000 0.0725 64.21 (281252) 0.8604 1.6229 0 (0) 56 (277701,277702,279551,279552,281251,..420022) 33 1800000 0.0725 70.88 (281252) 0.8443 1.7623 0 (0) 843 (252701,264451,264452,269802,276752,..426672) 34 1800000 0.0725 74.46 (281252) 0.8727 1.8996 0 (0) 1741 (281251,281252,282051,282052,283601,..445622) 35 1800000 0.0725 50.14 (412212) 0.8905 2.0577 0 (0) 2716 (368532,369931,369932,372562,372581,..457072) 36 1800000 0.0725 31.16 (466021) 0.8785 0.8549 0 (0) 2 (466021,466022) 37 1800000 0.0725 44.75 (415861) 0.7989 1.3541 0 (0) 323 (408731,408732,411281,413861,413862,..490232) 38 1800000 0.0725 47.37 (411811) 0.9474 2.0748 0 (0) 2936 (400292,400371,400372,400771,400772,..495282) 39 1800000 0.0725 27.97 (466872) 0.8778 0.7751 0 (0) 40 1800000 0.0725 7.56 (152518) 0.6923 0.6879 0 (0) 41 1800000 0.0725 7.32 (146921) 0.6207 0.6168 0 (0) 42 1800000 0.0725 6.99 (1756819) 0.6773 0.6053 0 (0) 43 1800000 0.0725 6.27 (1275231) 0.8971 0.7166 0 (0) 44 540000 0.0032 22.18 (416117) 1.2127 1.3930 0 (0) 45 540000 0.0032 34.34 (124668) 1.3483 2.0767 0 (0) 5 (83864,84374,85574,124667,124668) 46 540000 0.0032 32.43 (123279) 1.2910 2.1807 0 (0) 7 (123278,123279,123353,123354,124928,..124953) 47 540000 0.0032 30.04 (83864) 1.2735 1.9704 0 (0) 1 (83864) 48 540000 0.0032 7.63 (233683) 0.8320 0.7331 0 (0) 49 540000 0.0032 5.73 (204097) 0.6421 0.5255 0 (0) 50 540000 0.0032 6.44 (72596) 0.5834 0.5951 0 (0) 51 540000 0.0032 6.67 (468680) 0.5008 0.5379 0 (0) 52 540000 0.0032 7.63 (394623) 0.6245 0.5303 0 (0) 53 540000 0.0032 5.96 (456858) 0.4830 0.4921 0 (0) 54 540000 0.0032 5.73 (361510) 0.7487 0.5892 0 (0) 55 540000 0.0032 5.96 (108106) 0.5493 0.5178 0 (0) 56 540000 0.0032 20.98 (539973) 1.1312 0.9690 0 (0) 57 540000 0.0032 7.15 (537416) 0.4435 0.4855 0 (0) 58 540000 0.0032 11.45 (537372) 0.5048 0.5684 0 (0) 59 540000 0.0032 15.02 (537417) 0.6277 0.8161 0 (0) 60 540000 0.0032 20.02 (511100) 0.7036 1.2132 0 (0) 61 540000 0.0032 28.14 (471348) 0.9544 1.3316 0 (0) 62 540000 0.0032 25.04 (461373) 0.7629 1.2334 0 (0) 63 540000 0.0032 17.88 (532206) 0.6224 1.2652 0 (0) 3.14.4-rt5 + patches, nohz_cps=4-63, but with rt cpuset ticked FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 1727990 0.0159 15.75 (1110991) 0.2705 0.6782 0 (0) 5 1727991 0.0159 18.10 (1152942) 0.2295 0.5379 0 (0) 6 1727991 0.0159 12.14 (740256) 0.1871 0.4767 0 (0) 7 1727991 0.0159 13.61 (1398636) 0.1943 0.4162 0 (0) 8 1727991 0.0159 21.68 (1128942) 0.2343 0.4782 0 (0) 9 1727991 0.0159 17.66 (1373532) 0.2449 0.6676 0 (0) 10 1727991 0.0159 21.68 (1370843) 0.3469 0.7723 0 (0) 11 1727993 0.0159 14.56 (1654553) 0.3085 0.5460 0 (0) 12 1727994 0.0159 16.91 (1640920) 0.6324 0.6797 0 (0) 13 1727994 0.0159 14.05 (1646008) 0.4396 0.5460 0 (0) 14 1727994 0.0159 17.90 (1385628) 0.1940 0.5126 0 (0) 15 1727994 0.0159 14.53 (1324235) 0.5829 0.6251 0 (0) 16 1727994 0.0159 18.61 (1170511) 0.2769 0.7185 0 (0) 17 1727995 0.0159 15.96 (147911) 0.6323 1.1918 0 (0) 18 1727995 0.0159 17.63 (1147518) 0.3379 0.8251 0 (0) 19 1727997 0.0159 16.67 (1110990) 0.3504 0.8739 0 (0) 20 1727997 0.0159 25.02 (1371899) 0.4171 0.9395 0 (0) 21 1727997 0.0159 18.10 (1604824) 0.5120 0.9121 0 (0) 22 1727997 0.0159 16.67 (1292219) 0.3707 1.0057 0 (0) 23 1727997 0.0159 21.92 (1311275) 0.7368 0.9864 0 (0) 24 1799997 0.0725 11.13 (513155) 0.2668 0.4618 0 (0) 25 1799997 0.0725 14.47 (57648) 1.3242 1.3780 0 (0) 26 1799997 0.0725 16.14 (790209) 0.3020 0.6119 0 (0) 27 1799999 0.0725 18.52 (406878) 0.9193 1.2727 0 (0) 28 1799999 0.0725 16.38 (1333567) 0.9898 1.0894 0 (0) 29 1799999 0.0725 17.57 (1388068) 1.5751 1.4865 0 (0) 30 1800000 0.0725 15.90 (944311) 0.8447 1.0841 0 (0) 31 1800000 0.0725 19.00 (1118689) 1.2406 1.2589 0 (0) 32 1800000 0.0725 33.21 (623752) 0.2034 0.5721 0 (0) 2 (623751,623752) 33 1800000 0.0725 43.80 (623751) 0.1869 0.5132 0 (0) 40 (555521,555522,558341,558342,561411,..634502) 34 1800000 0.0725 54.91 (623752) 0.2003 0.6696 0 (0) 256 (621921,621922,622231,622232,622251,..634652) 35 1800000 0.0725 11.61 (1689123) 0.2855 0.3485 0 (0) 36 1800000 0.0725 19.95 (662351) 0.1990 0.4114 0 (0) 37 1800000 0.0725 7.32 (775961) 0.1888 0.3463 0 (0) 38 1800000 0.0725 45.23 (623751) 0.1812 0.5458 0 (0) 72 (555521,555522,558341,558342,561411,..633852) 39 1800000 0.0725 6.13 (2023) 0.1899 0.3141 0 (0) 40 1800000 0.0725 3.03 (76948) 0.1348 0.1367 0 (0) 41 1800000 0.0725 3.27 (278393) 0.1473 0.1599 0 (0) 42 1800000 0.0725 3.74 (149791) 0.1490 0.1477 0 (0) 43 1800000 0.0725 3.74 (76948) 0.3163 0.2310 0 (0) 44 540000 0.0032 32.18 (187964) 0.2410 0.9227 0 (0) 19 (166655,166656,186674,186675,187289,..188639) 45 540000 0.0032 13.35 (118119) 0.9129 0.6304 0 (0) 46 540000 0.0032 38.87 (187290) 0.3234 1.1625 0 (0) 227 (167501,167502,168422,168423,168575,..190395) 47 540000 0.0032 10.97 (70989) 0.7904 0.5492 0 (0) 48 540000 0.0032 16.22 (526242) 0.1975 0.3578 0 (0) 49 540000 0.0032 4.06 (113415) 0.1632 0.2182 0 (0) 50 540000 0.0032 36.00 (348112) 1.0410 1.1357 0 (0) 4 (348112,348113,505655,505656) 51 540000 0.0032 7.63 (410141) 0.1802 0.3413 0 (0) 52 540000 0.0032 7.63 (388792) 0.1467 0.2892 0 (0) 53 540000 0.0032 22.41 (526242) 0.8564 1.0304 0 (0) 54 540000 0.0032 16.22 (526242) 0.3234 0.5915 0 (0) 55 540000 0.0032 20.51 (523077) 0.8910 1.0718 0 (0) 56 540000 0.0032 20.51 (433008) 0.4362 1.3129 0 (0) 57 540000 0.0032 23.84 (429227) 0.4652 1.3327 0 (0) 58 540000 0.0032 23.60 (409787) 0.3603 1.1818 0 (0) 59 540000 0.0032 21.69 (379973) 0.5904 1.3033 0 (0) 60 540000 0.0032 24.80 (365811) 0.6453 1.4206 0 (0) 61 540000 0.0032 26.23 (351861) 0.8988 1.2698 0 (0) 62 540000 0.0032 24.32 (354042) 0.9292 1.2590 0 (0) 63 540000 0.0032 24.80 (354441) 0.6686 1.3598 0 (0) 3.14.4-rt5 + patches. no nohz_full mask supplied, rt cpuset ticked FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 1727999 0.0159 19.77 (255927) 0.6208 0.8240 0 (0) 5 1727999 0.0159 5.02 (647067) 0.0868 0.1769 0 (0) 6 1727999 0.0159 5.23 (643274) 0.0783 0.1614 0 (0) 7 1728000 0.0159 5.26 (645483) 0.0940 0.1667 0 (0) 8 1728000 0.0159 5.23 (1308493) 0.0928 0.2128 0 (0) 9 1728000 0.0159 4.31 (7073) 0.0866 0.1984 0 (0) 10 1728000 0.0159 4.78 (1409) 0.0857 0.2031 0 (0) 11 1728000 0.0159 6.18 (1177115) 0.0946 0.2038 0 (0) 12 1728000 0.0159 5.02 (1607995) 0.0921 0.2065 0 (0) 13 1728000 0.0159 5.98 (1164828) 0.1021 0.2268 0 (0) 14 1728000 0.0159 7.61 (1143227) 0.1055 0.2670 0 (0) 15 1728000 0.0159 5.94 (1122923) 0.1346 0.2006 0 (0) 16 1728000 0.0159 5.98 (285214) 0.1058 0.2706 0 (0) 17 1728000 0.0159 9.04 (1143131) 0.1198 0.3034 0 (0) 18 1728000 0.0159 5.98 (962842) 0.0934 0.2315 0 (0) 19 1728000 0.0159 5.74 (1747) 0.1115 0.2409 0 (0) 20 1728000 0.0159 5.94 (264838) 0.0931 0.2247 0 (0) 21 1728000 0.0159 7.88 (389138) 0.1144 0.2702 0 (0) 22 1728000 0.0159 7.85 (588413) 0.1962 0.2656 0 (0) 23 1728000 0.0159 5.98 (1110060) 0.1984 0.3365 0 (0) 24 1800000 0.0725 2.79 (796117) 0.1595 0.1585 0 (0) 25 1800000 0.0725 2.31 (316117) 0.1086 0.0560 0 (0) 26 1800000 0.0725 2.46 (796118) 0.1074 0.0544 0 (0) 27 1800000 0.0725 3.98 (613155) 0.1087 0.0699 0 (0) 28 1800000 0.0725 3.17 (613156) 0.1085 0.0585 0 (0) 29 1800000 0.0725 8.18 (613156) 0.2171 0.2721 0 (0) 30 1800000 0.0725 7.94 (612931) 0.2242 0.2651 0 (0) 31 1800000 0.0725 7.94 (612931) 0.2499 0.3039 0 (0) 32 1800000 0.0725 3.03 (1756117) 0.1260 0.0819 0 (0) 33 1800000 0.0725 3.27 (1085885) 0.5809 0.4470 0 (0) 34 1800000 0.0725 2.55 (316117) 0.1056 0.0504 0 (0) 35 1800000 0.0725 3.27 (1965) 0.2121 0.4379 0 (0) 36 1800000 0.0725 4.13 (1) 0.1324 0.0874 0 (0) 37 1800000 0.0725 4.13 (1) 0.1808 0.1423 0 (0) 38 1800000 0.0725 4.13 (1) 0.2027 0.1488 0 (0) 39 1800000 0.0725 4.13 (1) 0.2094 0.1533 0 (0) 40 1800000 0.0725 2.93 (316118) 0.1093 0.0742 0 (0) 41 1800000 0.0725 3.03 (1756116) 0.1971 0.1469 0 (0) 42 1800000 0.0725 3.27 (1210118) 0.6525 0.5411 0 (0) 43 1800000 0.0725 3.03 (316117) 0.1048 0.0498 0 (0) 44 539999 0.0032 2.86 (94835) 0.0828 0.1364 0 (0) 45 539999 0.0032 2.62 (94834) 0.1914 0.1657 0 (0) 46 540000 0.0032 3.10 (94834) 0.2435 0.1900 0 (0) 47 540000 0.0032 2.38 (94834) 0.2505 0.1965 0 (0) 48 540000 0.0032 3.82 (524593) 0.1307 0.2533 0 (0) 49 540000 0.0032 2.86 (447946) 0.0904 0.2040 0 (0) 50 540000 0.0032 3.34 (434056) 0.2087 0.2865 0 (0) 51 540000 0.0032 3.10 (94835) 0.0921 0.2016 0 (0) 52 540000 0.0032 7.39 (522302) 0.3460 0.3597 0 (0) 53 540000 0.0032 7.39 (522302) 0.3449 0.3567 0 (0) 54 540000 0.0032 7.15 (522302) 0.3259 0.3550 0 (0) 55 540000 0.0032 7.39 (522302) 0.3274 0.3578 0 (0) 56 540000 0.0032 6.20 (387845) 0.1547 0.3660 0 (0) 57 540000 0.0032 7.87 (367398) 0.1425 0.3847 0 (0) 58 540000 0.0032 7.63 (347661) 0.1272 0.3915 0 (0) 59 540000 0.0032 9.30 (347660) 0.1239 0.3552 0 (0) 60 540000 0.0032 12.16 (152143) 0.3169 0.3407 0 (0) 61 540000 0.0032 10.02 (152143) 0.3359 0.3369 0 (0) 62 540000 0.0032 12.16 (152143) 0.3347 0.3341 0 (0) 63 540000 0.0032 12.40 (152143) 0.2970 0.3460 0 (0) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-18 4:22 ` Mike Galbraith @ 2014-05-18 5:20 ` Paul E. McKenney 2014-05-18 8:36 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Paul E. McKenney @ 2014-05-18 5:20 UTC (permalink / raw) To: Mike Galbraith Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Sun, May 18, 2014 at 06:22:34AM +0200, Mike Galbraith wrote: > On Thu, 2014-05-15 at 05:18 +0200, Mike Galbraith wrote: > > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > > > > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > > for -rt kernels in production environments. > > > > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at > > all with 60 cores isolated. I didn't have time to rummage, but it looks > > like there are still bugs to squash. > > I tested a bit more yesterday. With only NO_HZ_FULL (no all), it did go > tickless. A 10 second sample of perturbation numbers below. Pretty > noisy, but it does work in rt. > > Below that, some jitter numbers, using a simplified and imperfect model > of a high end rt video game (game over, insert 1 gold bar to continue > variety of high end) executive synchronizing a simple load on 60 cores. > Bottom line there was if user thinks booting nohz_full=set to get any > core quiescence provided by that should improve jitter for a threaded > load despite not being able to shut tick down, he's wrong. If you are saying that turning on nohz_full doesn't help unless you also ensure that there is only one runnable task per CPU, I completely agree. If you are saying something else, you lost me. ;-) Thanx, Paul > -Mike > > vogelweide:/abuild/mike/:[0]# head -180 xx|tail -60 > pert/s: 500 >14.10us: 2 min: 1.90 max: 96.86 avg: 5.17 sum/s: 2586us overhead: 0.26% > pert/s: 1 >18.63us: 0 min: 3.38 max: 8.37 avg: 4.71 sum/s: 5us overhead: 0.00% > pert/s: 1 >18.54us: 0 min: 3.41 max: 8.51 avg: 4.61 sum/s: 5us overhead: 0.00% > pert/s: 1 >18.78us: 0 min: 3.45 max: 8.67 avg: 4.41 sum/s: 4us overhead: 0.00% > pert/s: 1 >19.60us: 0 min: 2.68 max: 7.47 avg: 4.30 sum/s: 4us overhead: 0.00% > pert/s: 1 >20.61us: 0 min: 2.60 max: 7.54 avg: 3.94 sum/s: 4us overhead: 0.00% > pert/s: 1 >21.03us: 0 min: 2.70 max: 7.81 avg: 3.91 sum/s: 4us overhead: 0.00% > pert/s: 1 >19.89us: 0 min: 2.65 max: 7.83 avg: 3.98 sum/s: 4us overhead: 0.00% > pert/s: 1 >29.36us: 0 min: 3.78 max: 9.82 avg: 6.10 sum/s: 6us overhead: 0.00% > pert/s: 1 >28.86us: 0 min: 4.56 max: 10.12 avg: 6.36 sum/s: 6us overhead: 0.00% > pert/s: 1 >21.34us: 0 min: 2.54 max: 7.79 avg: 3.38 sum/s: 3us overhead: 0.00% > pert/s: 1 >30.12us: 0 min: 3.51 max: 8.41 avg: 4.47 sum/s: 4us overhead: 0.00% > pert/s: 1 >21.01us: 0 min: 2.44 max: 7.36 avg: 3.33 sum/s: 3us overhead: 0.00% > pert/s: 1 >22.41us: 0 min: 2.42 max: 7.71 avg: 3.60 sum/s: 4us overhead: 0.00% > pert/s: 1 >30.37us: 0 min: 3.46 max: 8.62 avg: 4.49 sum/s: 4us overhead: 0.00% > pert/s: 1 >29.50us: 0 min: 3.43 max: 9.23 avg: 4.13 sum/s: 4us overhead: 0.00% > pert/s: 1 >20.66us: 0 min: 2.49 max: 7.73 avg: 4.24 sum/s: 4us overhead: 0.00% > pert/s: 1 >33.87us: 0 min: 4.45 max: 9.59 avg: 5.63 sum/s: 6us overhead: 0.00% > pert/s: 1 >34.70us: 0 min: 4.47 max: 10.15 avg: 6.41 sum/s: 6us overhead: 0.00% > pert/s: 1 >29.62us: 0 min: 4.49 max: 9.87 avg: 5.69 sum/s: 6us overhead: 0.00% > pert/s: 1 >36.92us: 0 min: 3.53 max: 9.41 avg: 4.48 sum/s: 4us overhead: 0.00% > pert/s: 1 >35.31us: 0 min: 3.69 max: 9.00 avg: 5.30 sum/s: 5us overhead: 0.00% > pert/s: 1 >36.29us: 0 min: 3.34 max: 8.48 avg: 4.48 sum/s: 4us overhead: 0.00% > pert/s: 1 >34.90us: 0 min: 3.39 max: 9.21 avg: 4.45 sum/s: 4us overhead: 0.00% > pert/s: 1 >34.23us: 0 min: 3.37 max: 8.44 avg: 4.54 sum/s: 5us overhead: 0.00% > pert/s: 1 >34.45us: 0 min: 0.05 max: 9.41 avg: 4.40 sum/s: 5us overhead: 0.00% > pert/s: 1 >35.31us: 0 min: 3.89 max: 9.18 avg: 4.30 sum/s: 5us overhead: 0.00% > pert/s: 1 >35.98us: 0 min: 2.80 max: 9.28 avg: 4.74 sum/s: 5us overhead: 0.00% > pert/s: 1 >33.89us: 0 min: 3.15 max: 9.67 avg: 5.07 sum/s: 5us overhead: 0.00% > pert/s: 1 >35.16us: 0 min: 2.56 max: 9.40 avg: 4.84 sum/s: 5us overhead: 0.00% > pert/s: 1 >36.37us: 0 min: 4.49 max: 9.48 avg: 6.12 sum/s: 6us overhead: 0.00% > pert/s: 1 >38.10us: 0 min: 0.04 max: 34.86 avg: 6.62 sum/s: 13us overhead: 0.00% > pert/s: 1 >35.11us: 0 min: 5.05 max: 11.56 avg: 5.88 sum/s: 6us overhead: 0.00% > pert/s: 1 >36.88us: 0 min: 3.77 max: 12.37 avg: 6.13 sum/s: 6us overhead: 0.00% > pert/s: 1 >34.37us: 1 min: 2.08 max:199.64 avg: 20.67 sum/s: 25us overhead: 0.00% > pert/s: 1 >35.57us: 1 min: 2.11 max:198.61 avg: 19.17 sum/s: 25us overhead: 0.00% > pert/s: 1 >33.89us: 1 min: 2.46 max:199.49 avg: 19.85 sum/s: 26us overhead: 0.00% > pert/s: 1 >37.58us: 1 min: 2.34 max:199.79 avg: 19.59 sum/s: 25us overhead: 0.00% > pert/s: 1 >34.57us: 0 min: 3.43 max: 13.37 avg: 5.86 sum/s: 6us overhead: 0.00% > pert/s: 1 >21.10us: 1 min: 2.42 max:199.97 avg: 20.08 sum/s: 26us overhead: 0.00% > pert/s: 1 >20.86us: 1 min: 2.23 max:194.83 avg: 19.69 sum/s: 26us overhead: 0.00% > pert/s: 1 >22.47us: 1 min: 2.15 max:197.13 avg: 19.61 sum/s: 25us overhead: 0.00% > pert/s: 1 >21.42us: 1 min: 2.24 max:198.75 avg: 19.70 sum/s: 26us overhead: 0.00% > pert/s: 1 >34.85us: 0 min: 0.05 max: 10.83 avg: 3.80 sum/s: 5us overhead: 0.00% > pert/s: 1 >33.72us: 0 min: 4.34 max: 11.78 avg: 6.04 sum/s: 6us overhead: 0.00% > pert/s: 1 >21.49us: 2 min: 2.13 max:200.35 avg: 20.22 sum/s: 26us overhead: 0.00% > pert/s: 1 >22.52us: 2 min: 2.32 max:197.07 avg: 21.02 sum/s: 27us overhead: 0.00% > pert/s: 1 >22.35us: 2 min: 2.16 max:197.04 avg: 20.59 sum/s: 27us overhead: 0.00% > pert/s: 1 >35.38us: 0 min: 3.20 max: 10.42 avg: 4.56 sum/s: 5us overhead: 0.00% > pert/s: 306 >17.41us: 1 min: 1.31 max: 51.95 avg: 6.83 sum/s: 2091us overhead: 0.21% > pert/s: 1 >99.81us: 1 min: 2.11 max:196.91 avg: 20.61 sum/s: 27us overhead: 0.00% > pert/s: 1 >21.45us: 2 min: 2.31 max:196.62 avg: 20.49 sum/s: 27us overhead: 0.00% > pert/s: 1 >97.14us: 1 min: 2.22 max:195.97 avg: 21.29 sum/s: 28us overhead: 0.00% > pert/s: 1 >21.94us: 2 min: 2.25 max:199.98 avg: 20.14 sum/s: 26us overhead: 0.00% > pert/s: 206 >75.07us: 1 min: 1.60 max:116.17 avg: 5.83 sum/s: 1202us overhead: 0.12% > pert/s: 1 >96.39us: 1 min: 2.26 max:194.60 avg: 21.29 sum/s: 28us overhead: 0.00% > pert/s: 1 >94.72us: 1 min: 2.20 max:193.63 avg: 21.32 sum/s: 28us overhead: 0.00% > pert/s: 1 >97.23us: 1 min: 2.08 max:198.18 avg: 21.27 sum/s: 28us overhead: 0.00% > pert/s: 1 >88.44us: 0 min: 2.28 max: 11.33 avg: 7.05 sum/s: 8us overhead: 0.00% > pert/s: 1 >89.22us: 0 min: 3.59 max: 15.86 avg: 7.81 sum/s: 8us overhead: 0.00% > > model is not picky, calls frame jitter >30us a 'Flier', counts them, tags a few. > > 3.14.4-rt5 virgin source nohz_full=4-63 > > FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 > FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 > FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 > on your marks... get set... POW! > Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) > 4 1727998 0.0159 184.04 (1170916)0.7079 0.7321 0 (0) 16 (955515,955516,986596,986597,1017316,..1561069) > 5 1727998 0.0159 186.94 (1171397)0.4114 0.6508 0 (0) 16 (956356,956357,987076,987077,1017796,..1171397) > 6 1727999 0.0159 36.73 (595340) 0.8620 0.8818 0 (0) 11 (86411,86412,91211,91212,96011,..1209942) > 7 1727999 0.0159 189.53 (1141636)0.4791 0.6720 0 (0) 17 (895876,926596,926597,957316,957317,..1141637) > 8 1728000 0.0159 184.07 (988517) 0.3885 0.6788 0 (0) 16 (773476,773477,804196,804197,834916,..988517) > 9 1728000 0.0159 180.74 (1050437)0.3514 0.6649 0 (0) 16 (835396,835397,866116,866117,896836,..1050437) > 10 1728000 0.0159 188.84 (1020197)0.4211 0.6945 0 (0) 16 (805156,805157,835876,835877,866596,..1020197) > 11 1728000 0.0159 180.98 (959237) 0.3867 0.6802 0 (0) 16 (744196,744197,774916,774917,805636,..959237) > 12 1728000 0.0159 176.41 (898276) 0.6384 0.6972 0 (0) 16 (683236,683237,713956,713957,744676,..898277) > 13 1728000 0.0159 188.84 (837317) 0.7538 0.8263 0 (0) 16 (622276,622277,652996,652997,683716,..837317) > 14 1728000 0.0159 178.83 (1022117)0.5803 0.6995 0 (0) 16 (807076,807077,837796,837797,868516,..1022117) > 15 1728000 0.0159 187.17 (838277) 0.7163 0.8367 0 (0) 16 (623236,623237,653956,653957,684676,..838277) > 16 1728000 0.0159 184.31 (992357) 0.6860 0.9137 0 (0) 16 (777316,777317,808036,808037,838756,..992357) > 17 1728000 0.0159 190.75 (962117) 0.6607 0.9281 0 (0) 17 (716356,747076,747077,777796,777797,..962117) > 18 1728000 0.0159 186.46 (870437) 0.6505 0.9303 0 (0) 16 (655396,655397,686116,686117,716836,..870437) > 19 1728000 0.0159 187.62 (901636) 0.8962 0.9769 0 (0) 16 (686596,686597,717316,717317,748036,..901637) > 20 1728000 0.0159 187.89 (748517) 1.0297 1.0907 0 (0) 16 (533476,533477,564196,564197,594916,..748517) > 21 1728000 0.0159 177.84 (779716) 0.9255 1.0430 0 (0) 16 (564676,564677,595396,595397,626116,..779717) > 22 1728000 0.0159 192.42 (780197) 0.6549 0.9060 0 (0) 18 (534436,534437,565156,565157,595876,..780197) > 23 1728000 0.0159 179.99 (719236) 0.8476 1.0329 0 (0) 16 (504196,504197,534916,534917,565636,..719237) > 24 1800000 0.0725 8.75 (1545420) 0.8685 0.6879 0 (0) > 25 1800000 0.0725 20.19 (1550204) 0.8268 0.7200 0 (0) > 26 1800000 0.0725 34.02 (14704) 0.6771 0.7340 0 (0) 104 (14704,14705,46704,46705,78704,..1774705) > 27 1800000 0.0725 48.47 (1519205) 0.6290 0.6766 0 (0) 112 (15204,15205,47204,47205,79204,..1775205) > 28 1800000 0.0725 65.64 (1711705) 0.6870 0.7524 0 (0) 112 (15704,15705,47704,47705,79704,..1775705) > 29 1800000 0.0725 108.41 (1616204)0.4684 0.9344 0 (0) 112 (16204,16205,48204,48205,80204,..1776205) > 30 1800000 0.0725 108.17 (1680704)0.8166 1.0311 0 (0) 112 (16704,16705,48704,48705,80704,..1776705) > 31 1800000 0.0725 123.81 (49205) 0.6050 1.1110 0 (0) 112 (17204,17205,49204,49205,81204,..1777205) > 32 1800000 0.0725 113.42 (17704) 0.6158 0.9614 0 (0) 112 (17704,17705,49704,49705,81704,..1777705) > 33 1800000 0.0725 184.94 (1458204)1.0111 1.7618 0 (0) 112 (18204,18205,50204,50205,82204,..1778205) > 34 1800000 0.0725 194.72 (1490704)1.1291 1.7317 0 (0) 98 (18704,18705,50704,50705,82704,..1778705) > 35 1800000 0.0725 185.56 (339205) 0.5819 1.5599 0 (0) 112 (19204,19205,51204,51205,83204,..1779205) > 36 1800000 0.0725 30.45 (227051) 0.9345 1.0711 0 (0) 1 (227051) > 37 1800000 0.0725 184.61 (1780205)0.7439 1.5621 0 (0) 112 (20204,20205,52204,52205,84204,..1780205) > 38 1800000 0.0725 28.30 (329851) 0.9923 0.8368 0 (0) > 39 1800000 0.0725 26.15 (183951) 0.9777 0.8443 0 (0) > 40 1800000 0.0725 6.75 (1136462) 0.9447 0.7415 0 (0) > 41 1800000 0.0725 6.03 (1128961) 0.8416 0.6431 0 (0) > 42 1800000 0.0725 6.51 (1557012) 0.8961 0.6698 0 (0) > 43 1800000 0.0725 7.08 (1294060) 0.7015 0.6162 0 (0) > 44 540000 0.0032 9.30 (470245) 0.7457 0.5968 0 (0) > 45 540000 0.0032 17.88 (55184) 0.6024 0.9190 0 (0) > 46 540000 0.0032 104.43 (535411) 1.0158 1.0948 0 (0) 50 (305010,305011,314610,314611,324210,..535411) > 47 540000 0.0032 17.16 (261898) 0.8780 1.0232 0 (0) > 48 540000 0.0032 17.65 (132511) 0.5251 0.4697 0 (0) > 49 540000 0.0032 166.41 (535860) 0.4444 1.4015 0 (0) 84 (142260,142261,151860,151861,161460,..535861) > 50 540000 0.0032 86.30 (132810) 0.6620 0.6966 0 (0) 14 (46410,46411,56010,56011,84810,..132811) > 51 540000 0.0032 193.36 (372961) 0.7880 1.6782 0 (0) 62 (8160,8161,17760,17761,27360,..372961) > 52 540000 0.0032 113.96 (133110) 0.5096 0.7058 0 (0) 14 (46710,46711,56310,56311,85110,..133111) > 53 540000 0.0032 191.69 (325261) 0.9986 1.6550 0 (0) 54 (8460,8461,18060,18061,27660,..325261) > 54 540000 0.0032 123.74 (133411) 0.5987 0.7777 0 (0) 14 (47010,47011,56610,56611,85410,..133411) > 55 540000 0.0032 193.60 (287161) 0.6729 1.5368 0 (0) 46 (8760,8761,18360,18361,27960,..287161) > 56 540000 0.0032 194.79 (133711) 0.8196 1.3677 0 (0) 12 (47310,47311,56910,56911,85710,..133711) > 57 540000 0.0032 19.08 (410169) 0.7227 1.1486 0 (0) > 58 540000 0.0032 192.16 (38010) 0.7522 1.3604 0 (0) 8 (9210,9211,18810,18811,28410,..38011) > 59 540000 0.0032 18.12 (360437) 0.8875 1.1737 0 (0) > 60 540000 0.0032 23.84 (316830) 1.0781 1.2241 0 (0) > 61 540000 0.0032 27.90 (316815) 1.1200 1.2471 0 (0) > 62 540000 0.0032 22.18 (316830) 0.9777 1.0933 0 (0) > 63 540000 0.0032 26.94 (317310) 1.1227 1.2478 0 (0) > > Reference: 3.12.18-rt25-0.gf8a6df6-rt, nohz_idle, cpuset switches tick ON for rt set > > FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 > FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 > FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 > on your marks... get set... POW! > Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) > 4 1727993 0.0159 4.07 (2599) 0.1072 0.1890 0 (0) > 5 1727993 0.0159 4.55 (807476) 0.0954 0.1669 0 (0) > 6 1727993 0.0159 3.80 (38023) 0.1321 0.2110 0 (0) > 7 1727994 0.0159 3.35 (1724219) 0.0898 0.1628 0 (0) > 8 1727994 0.0159 3.80 (109400) 0.0957 0.1852 0 (0) > 9 1727994 0.0159 3.83 (710923) 0.1001 0.1779 0 (0) > 10 1727994 0.0159 3.35 (529312) 0.1009 0.1741 0 (0) > 11 1727995 0.0159 3.83 (1372590) 0.0935 0.1740 0 (0) > 12 1727995 0.0159 3.83 (51129) 0.0857 0.1724 0 (0) > 13 1727995 0.0159 3.83 (1273109) 0.1028 0.1852 0 (0) > 14 1727996 0.0159 3.59 (486904) 0.1005 0.1811 0 (0) > 15 1727996 0.0159 3.35 (691340) 0.1589 0.1899 0 (0) > 16 1727996 0.0159 4.07 (1638706) 0.1340 0.2526 0 (0) > 17 1727997 0.0159 4.55 (913535) 0.1110 0.2050 0 (0) > 18 1727997 0.0159 4.31 (1704012) 0.1193 0.2129 0 (0) > 19 1727997 0.0159 5.23 (1273925) 0.1434 0.2372 0 (0) > 20 1727997 0.0159 5.26 (16547) 0.1119 0.2259 0 (0) > 21 1727998 0.0159 5.71 (341896) 0.1893 0.2458 0 (0) > 22 1727998 0.0159 4.55 (1276554) 0.1005 0.1961 0 (0) > 23 1727998 0.0159 5.71 (1029507) 0.2141 0.2460 0 (0) > 24 1799998 0.0725 3.98 (1551231) 0.1059 0.0518 0 (0) > 25 1799998 0.0725 2.79 (272233) 0.1192 0.0866 0 (0) > 26 1799998 0.0725 3.03 (272233) 0.1817 0.1317 0 (0) > 27 1799999 0.0725 3.03 (272233) 0.1426 0.1009 0 (0) > 28 1799999 0.0725 2.79 (402235) 0.2632 0.2574 0 (0) > 29 1799999 0.0725 2.46 (387055) 0.1109 0.0709 0 (0) > 30 1799999 0.0725 3.03 (387054) 0.1301 0.0860 0 (0) > 31 1800000 0.0725 4.60 (1329743) 0.3274 0.2551 0 (0) > 32 1800000 0.0725 2.93 (867055) 0.1076 0.0535 0 (0) > 33 1800000 0.0725 2.93 (867055) 0.1049 0.0524 0 (0) > 34 1800000 0.0725 3.50 (1347054) 0.1132 0.0740 0 (0) > 35 1800000 0.0725 3.27 (867054) 0.1076 0.0857 0 (0) > 36 1800000 0.0725 3.27 (867054) 0.2015 0.1381 0 (0) > 37 1800000 0.0725 3.74 (1347054) 0.1020 0.0461 0 (0) > 38 1800000 0.0725 3.17 (867055) 0.1118 0.0752 0 (0) > 39 1800000 0.0725 2.93 (1347055) 0.1092 0.0624 0 (0) > 40 1800000 0.0725 2.55 (867054) 0.1126 0.0703 0 (0) > 41 1800000 0.0725 2.93 (867055) 0.1092 0.0560 0 (0) > 42 1800000 0.0725 3.65 (387055) 0.2079 0.1424 0 (0) > 43 1800000 0.0725 6.51 (905345) 0.3940 0.3751 0 (0) > 44 539999 0.0032 3.10 (260115) 0.3366 0.2650 0 (0) > 45 539999 0.0032 2.62 (116115) 0.3365 0.2648 0 (0) > 46 539999 0.0032 4.53 (17248) 0.0854 0.2177 0 (0) > 47 539999 0.0032 4.06 (142) 0.0767 0.1120 0 (0) > 48 539999 0.0032 3.10 (260116) 0.0604 0.1029 0 (0) > 49 539999 0.0032 3.10 (404115) 0.0901 0.1160 0 (0) > 50 539999 0.0032 3.10 (404116) 0.1026 0.1449 0 (0) > 51 539999 0.0032 2.86 (260116) 0.1019 0.1571 0 (0) > 52 539999 0.0032 3.10 (116115) 0.0776 0.1190 0 (0) > 53 539999 0.0032 3.10 (260115) 0.0719 0.1121 0 (0) > 54 539999 0.0032 3.10 (260115) 0.3323 0.2669 0 (0) > 55 539999 0.0032 3.10 (260115) 0.3534 0.2873 0 (0) > 56 539999 0.0032 4.53 (422169) 0.1143 0.2653 0 (0) > 57 539999 0.0032 3.10 (260116) 0.1021 0.2007 0 (0) > 58 539999 0.0032 3.10 (116116) 0.0996 0.2120 0 (0) > 59 539999 0.0032 2.86 (116116) 0.0989 0.2017 0 (0) > 60 539999 0.0032 2.86 (116115) 0.3619 0.2676 0 (0) > 61 539999 0.0032 2.86 (116115) 0.3453 0.2593 0 (0) > 62 539999 0.0032 2.86 (116116) 0.1029 0.2016 0 (0) > 63 539999 0.0032 3.10 (116116) 0.0858 0.1893 0 (0) > > 3.14.4-rt5 + patches (hacks from reference) nohz_cpus=4-63 > > FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 > FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 > FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 > on your marks... get set... POW! > Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) > 4 1728000 0.0159 10.98 (1625029) 0.4708 0.6897 0 (0) > 5 1728000 0.0159 11.94 (1612693) 0.6770 0.8055 0 (0) > 6 1728000 0.0159 11.90 (1544891) 0.6773 0.6996 0 (0) > 7 1728000 0.0159 13.10 (1357976) 0.5167 0.6947 0 (0) > 8 1728000 0.0159 21.71 (1331577) 0.5297 0.8358 0 (0) > 9 1728000 0.0159 21.68 (1331576) 0.6269 0.7978 0 (0) > 10 1728000 0.0159 19.53 (889433) 0.8569 0.8690 0 (0) > 11 1728000 0.0159 29.82 (1508316) 0.5446 0.7971 0 (0) > 12 1728000 0.0159 24.81 (1502988) 0.5601 0.8393 0 (0) > 13 1728000 0.0159 21.71 (1476396) 0.7357 0.8276 0 (0) > 14 1728000 0.0159 12.86 (1178839) 0.5667 0.8542 0 (0) > 15 1728000 0.0159 11.90 (1671304) 0.6289 0.6630 0 (0) > 16 1728000 0.0159 23.35 (1493387) 0.7169 1.1261 0 (0) > 17 1728000 0.0159 17.87 (1463627) 0.9685 1.3873 0 (0) > 18 1728000 0.0159 16.91 (826801) 0.7671 1.0588 0 (0) > 19 1728000 0.0159 19.06 (1238264) 0.6330 0.9972 0 (0) > 20 1728000 0.0159 37.18 (270001) 1.3230 1.9480 0 (0) 15 (266593,266594,268369,268370,270001,..876386) > 21 1728000 0.0159 40.52 (270769) 1.3299 1.6877 0 (0) 14 (266593,266594,268369,268370,270001,..876386) > 22 1728000 0.0159 28.86 (273746) 1.3284 2.2402 0 (0) > 23 1728000 0.0159 39.35 (270770) 1.2386 1.6255 0 (0) 13 (266593,266594,268369,268370,270001,..273746) > 24 1800000 0.0725 36.41 (282051) 1.3319 1.7074 0 (0) 9 (279551,279552,281251,281252,282051,..285152) > 25 1800000 0.0725 40.60 (281252) 1.5767 1.8895 0 (0) 14 (277701,277702,279551,279552,281251,..285152) > 26 1800000 0.0725 44.42 (281252) 1.4205 1.7594 0 (0) 16 (277701,277702,279551,279552,281251,..926361) > 27 1800000 0.0725 42.27 (281252) 1.3342 1.7801 0 (0) 14 (277701,277702,279551,279552,281251,..285152) > 28 1800000 0.0725 43.08 (281251) 0.8064 1.2355 0 (0) 16 (277701,277702,279551,279552,281251,..290202) > 29 1800000 0.0725 44.18 (281252) 0.9319 1.0451 0 (0) 14 (277701,277702,279551,279552,281251,..285152) > 30 1800000 0.0725 41.56 (279552) 1.5054 1.7734 0 (0) 19 (277701,277702,279551,279552,281251,..1379844) > 31 1800000 0.0725 43.23 (285152) 0.8076 0.9825 0 (0) 14 (277701,277702,279551,279552,281251,..285152) > 32 1800000 0.0725 64.21 (281252) 0.8604 1.6229 0 (0) 56 (277701,277702,279551,279552,281251,..420022) > 33 1800000 0.0725 70.88 (281252) 0.8443 1.7623 0 (0) 843 (252701,264451,264452,269802,276752,..426672) > 34 1800000 0.0725 74.46 (281252) 0.8727 1.8996 0 (0) 1741 (281251,281252,282051,282052,283601,..445622) > 35 1800000 0.0725 50.14 (412212) 0.8905 2.0577 0 (0) 2716 (368532,369931,369932,372562,372581,..457072) > 36 1800000 0.0725 31.16 (466021) 0.8785 0.8549 0 (0) 2 (466021,466022) > 37 1800000 0.0725 44.75 (415861) 0.7989 1.3541 0 (0) 323 (408731,408732,411281,413861,413862,..490232) > 38 1800000 0.0725 47.37 (411811) 0.9474 2.0748 0 (0) 2936 (400292,400371,400372,400771,400772,..495282) > 39 1800000 0.0725 27.97 (466872) 0.8778 0.7751 0 (0) > 40 1800000 0.0725 7.56 (152518) 0.6923 0.6879 0 (0) > 41 1800000 0.0725 7.32 (146921) 0.6207 0.6168 0 (0) > 42 1800000 0.0725 6.99 (1756819) 0.6773 0.6053 0 (0) > 43 1800000 0.0725 6.27 (1275231) 0.8971 0.7166 0 (0) > 44 540000 0.0032 22.18 (416117) 1.2127 1.3930 0 (0) > 45 540000 0.0032 34.34 (124668) 1.3483 2.0767 0 (0) 5 (83864,84374,85574,124667,124668) > 46 540000 0.0032 32.43 (123279) 1.2910 2.1807 0 (0) 7 (123278,123279,123353,123354,124928,..124953) > 47 540000 0.0032 30.04 (83864) 1.2735 1.9704 0 (0) 1 (83864) > 48 540000 0.0032 7.63 (233683) 0.8320 0.7331 0 (0) > 49 540000 0.0032 5.73 (204097) 0.6421 0.5255 0 (0) > 50 540000 0.0032 6.44 (72596) 0.5834 0.5951 0 (0) > 51 540000 0.0032 6.67 (468680) 0.5008 0.5379 0 (0) > 52 540000 0.0032 7.63 (394623) 0.6245 0.5303 0 (0) > 53 540000 0.0032 5.96 (456858) 0.4830 0.4921 0 (0) > 54 540000 0.0032 5.73 (361510) 0.7487 0.5892 0 (0) > 55 540000 0.0032 5.96 (108106) 0.5493 0.5178 0 (0) > 56 540000 0.0032 20.98 (539973) 1.1312 0.9690 0 (0) > 57 540000 0.0032 7.15 (537416) 0.4435 0.4855 0 (0) > 58 540000 0.0032 11.45 (537372) 0.5048 0.5684 0 (0) > 59 540000 0.0032 15.02 (537417) 0.6277 0.8161 0 (0) > 60 540000 0.0032 20.02 (511100) 0.7036 1.2132 0 (0) > 61 540000 0.0032 28.14 (471348) 0.9544 1.3316 0 (0) > 62 540000 0.0032 25.04 (461373) 0.7629 1.2334 0 (0) > 63 540000 0.0032 17.88 (532206) 0.6224 1.2652 0 (0) > > 3.14.4-rt5 + patches, nohz_cps=4-63, but with rt cpuset ticked > > FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 > FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 > FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 > on your marks... get set... POW! > Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) > 4 1727990 0.0159 15.75 (1110991) 0.2705 0.6782 0 (0) > 5 1727991 0.0159 18.10 (1152942) 0.2295 0.5379 0 (0) > 6 1727991 0.0159 12.14 (740256) 0.1871 0.4767 0 (0) > 7 1727991 0.0159 13.61 (1398636) 0.1943 0.4162 0 (0) > 8 1727991 0.0159 21.68 (1128942) 0.2343 0.4782 0 (0) > 9 1727991 0.0159 17.66 (1373532) 0.2449 0.6676 0 (0) > 10 1727991 0.0159 21.68 (1370843) 0.3469 0.7723 0 (0) > 11 1727993 0.0159 14.56 (1654553) 0.3085 0.5460 0 (0) > 12 1727994 0.0159 16.91 (1640920) 0.6324 0.6797 0 (0) > 13 1727994 0.0159 14.05 (1646008) 0.4396 0.5460 0 (0) > 14 1727994 0.0159 17.90 (1385628) 0.1940 0.5126 0 (0) > 15 1727994 0.0159 14.53 (1324235) 0.5829 0.6251 0 (0) > 16 1727994 0.0159 18.61 (1170511) 0.2769 0.7185 0 (0) > 17 1727995 0.0159 15.96 (147911) 0.6323 1.1918 0 (0) > 18 1727995 0.0159 17.63 (1147518) 0.3379 0.8251 0 (0) > 19 1727997 0.0159 16.67 (1110990) 0.3504 0.8739 0 (0) > 20 1727997 0.0159 25.02 (1371899) 0.4171 0.9395 0 (0) > 21 1727997 0.0159 18.10 (1604824) 0.5120 0.9121 0 (0) > 22 1727997 0.0159 16.67 (1292219) 0.3707 1.0057 0 (0) > 23 1727997 0.0159 21.92 (1311275) 0.7368 0.9864 0 (0) > 24 1799997 0.0725 11.13 (513155) 0.2668 0.4618 0 (0) > 25 1799997 0.0725 14.47 (57648) 1.3242 1.3780 0 (0) > 26 1799997 0.0725 16.14 (790209) 0.3020 0.6119 0 (0) > 27 1799999 0.0725 18.52 (406878) 0.9193 1.2727 0 (0) > 28 1799999 0.0725 16.38 (1333567) 0.9898 1.0894 0 (0) > 29 1799999 0.0725 17.57 (1388068) 1.5751 1.4865 0 (0) > 30 1800000 0.0725 15.90 (944311) 0.8447 1.0841 0 (0) > 31 1800000 0.0725 19.00 (1118689) 1.2406 1.2589 0 (0) > 32 1800000 0.0725 33.21 (623752) 0.2034 0.5721 0 (0) 2 (623751,623752) > 33 1800000 0.0725 43.80 (623751) 0.1869 0.5132 0 (0) 40 (555521,555522,558341,558342,561411,..634502) > 34 1800000 0.0725 54.91 (623752) 0.2003 0.6696 0 (0) 256 (621921,621922,622231,622232,622251,..634652) > 35 1800000 0.0725 11.61 (1689123) 0.2855 0.3485 0 (0) > 36 1800000 0.0725 19.95 (662351) 0.1990 0.4114 0 (0) > 37 1800000 0.0725 7.32 (775961) 0.1888 0.3463 0 (0) > 38 1800000 0.0725 45.23 (623751) 0.1812 0.5458 0 (0) 72 (555521,555522,558341,558342,561411,..633852) > 39 1800000 0.0725 6.13 (2023) 0.1899 0.3141 0 (0) > 40 1800000 0.0725 3.03 (76948) 0.1348 0.1367 0 (0) > 41 1800000 0.0725 3.27 (278393) 0.1473 0.1599 0 (0) > 42 1800000 0.0725 3.74 (149791) 0.1490 0.1477 0 (0) > 43 1800000 0.0725 3.74 (76948) 0.3163 0.2310 0 (0) > 44 540000 0.0032 32.18 (187964) 0.2410 0.9227 0 (0) 19 (166655,166656,186674,186675,187289,..188639) > 45 540000 0.0032 13.35 (118119) 0.9129 0.6304 0 (0) > 46 540000 0.0032 38.87 (187290) 0.3234 1.1625 0 (0) 227 (167501,167502,168422,168423,168575,..190395) > 47 540000 0.0032 10.97 (70989) 0.7904 0.5492 0 (0) > 48 540000 0.0032 16.22 (526242) 0.1975 0.3578 0 (0) > 49 540000 0.0032 4.06 (113415) 0.1632 0.2182 0 (0) > 50 540000 0.0032 36.00 (348112) 1.0410 1.1357 0 (0) 4 (348112,348113,505655,505656) > 51 540000 0.0032 7.63 (410141) 0.1802 0.3413 0 (0) > 52 540000 0.0032 7.63 (388792) 0.1467 0.2892 0 (0) > 53 540000 0.0032 22.41 (526242) 0.8564 1.0304 0 (0) > 54 540000 0.0032 16.22 (526242) 0.3234 0.5915 0 (0) > 55 540000 0.0032 20.51 (523077) 0.8910 1.0718 0 (0) > 56 540000 0.0032 20.51 (433008) 0.4362 1.3129 0 (0) > 57 540000 0.0032 23.84 (429227) 0.4652 1.3327 0 (0) > 58 540000 0.0032 23.60 (409787) 0.3603 1.1818 0 (0) > 59 540000 0.0032 21.69 (379973) 0.5904 1.3033 0 (0) > 60 540000 0.0032 24.80 (365811) 0.6453 1.4206 0 (0) > 61 540000 0.0032 26.23 (351861) 0.8988 1.2698 0 (0) > 62 540000 0.0032 24.32 (354042) 0.9292 1.2590 0 (0) > 63 540000 0.0032 24.80 (354441) 0.6686 1.3598 0 (0) > > 3.14.4-rt5 + patches. no nohz_full mask supplied, rt cpuset ticked > > FREQ=960 FRAMES=1728000 LOOP=50000 using CPUs 4 - 23 > FREQ=1000 FRAMES=1800000 LOOP=48000 using CPUs 24 - 43 > FREQ=300 FRAMES=540000 LOOP=160000 using CPUs 44 - 63 > on your marks... get set... POW! > Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) > 4 1727999 0.0159 19.77 (255927) 0.6208 0.8240 0 (0) > 5 1727999 0.0159 5.02 (647067) 0.0868 0.1769 0 (0) > 6 1727999 0.0159 5.23 (643274) 0.0783 0.1614 0 (0) > 7 1728000 0.0159 5.26 (645483) 0.0940 0.1667 0 (0) > 8 1728000 0.0159 5.23 (1308493) 0.0928 0.2128 0 (0) > 9 1728000 0.0159 4.31 (7073) 0.0866 0.1984 0 (0) > 10 1728000 0.0159 4.78 (1409) 0.0857 0.2031 0 (0) > 11 1728000 0.0159 6.18 (1177115) 0.0946 0.2038 0 (0) > 12 1728000 0.0159 5.02 (1607995) 0.0921 0.2065 0 (0) > 13 1728000 0.0159 5.98 (1164828) 0.1021 0.2268 0 (0) > 14 1728000 0.0159 7.61 (1143227) 0.1055 0.2670 0 (0) > 15 1728000 0.0159 5.94 (1122923) 0.1346 0.2006 0 (0) > 16 1728000 0.0159 5.98 (285214) 0.1058 0.2706 0 (0) > 17 1728000 0.0159 9.04 (1143131) 0.1198 0.3034 0 (0) > 18 1728000 0.0159 5.98 (962842) 0.0934 0.2315 0 (0) > 19 1728000 0.0159 5.74 (1747) 0.1115 0.2409 0 (0) > 20 1728000 0.0159 5.94 (264838) 0.0931 0.2247 0 (0) > 21 1728000 0.0159 7.88 (389138) 0.1144 0.2702 0 (0) > 22 1728000 0.0159 7.85 (588413) 0.1962 0.2656 0 (0) > 23 1728000 0.0159 5.98 (1110060) 0.1984 0.3365 0 (0) > 24 1800000 0.0725 2.79 (796117) 0.1595 0.1585 0 (0) > 25 1800000 0.0725 2.31 (316117) 0.1086 0.0560 0 (0) > 26 1800000 0.0725 2.46 (796118) 0.1074 0.0544 0 (0) > 27 1800000 0.0725 3.98 (613155) 0.1087 0.0699 0 (0) > 28 1800000 0.0725 3.17 (613156) 0.1085 0.0585 0 (0) > 29 1800000 0.0725 8.18 (613156) 0.2171 0.2721 0 (0) > 30 1800000 0.0725 7.94 (612931) 0.2242 0.2651 0 (0) > 31 1800000 0.0725 7.94 (612931) 0.2499 0.3039 0 (0) > 32 1800000 0.0725 3.03 (1756117) 0.1260 0.0819 0 (0) > 33 1800000 0.0725 3.27 (1085885) 0.5809 0.4470 0 (0) > 34 1800000 0.0725 2.55 (316117) 0.1056 0.0504 0 (0) > 35 1800000 0.0725 3.27 (1965) 0.2121 0.4379 0 (0) > 36 1800000 0.0725 4.13 (1) 0.1324 0.0874 0 (0) > 37 1800000 0.0725 4.13 (1) 0.1808 0.1423 0 (0) > 38 1800000 0.0725 4.13 (1) 0.2027 0.1488 0 (0) > 39 1800000 0.0725 4.13 (1) 0.2094 0.1533 0 (0) > 40 1800000 0.0725 2.93 (316118) 0.1093 0.0742 0 (0) > 41 1800000 0.0725 3.03 (1756116) 0.1971 0.1469 0 (0) > 42 1800000 0.0725 3.27 (1210118) 0.6525 0.5411 0 (0) > 43 1800000 0.0725 3.03 (316117) 0.1048 0.0498 0 (0) > 44 539999 0.0032 2.86 (94835) 0.0828 0.1364 0 (0) > 45 539999 0.0032 2.62 (94834) 0.1914 0.1657 0 (0) > 46 540000 0.0032 3.10 (94834) 0.2435 0.1900 0 (0) > 47 540000 0.0032 2.38 (94834) 0.2505 0.1965 0 (0) > 48 540000 0.0032 3.82 (524593) 0.1307 0.2533 0 (0) > 49 540000 0.0032 2.86 (447946) 0.0904 0.2040 0 (0) > 50 540000 0.0032 3.34 (434056) 0.2087 0.2865 0 (0) > 51 540000 0.0032 3.10 (94835) 0.0921 0.2016 0 (0) > 52 540000 0.0032 7.39 (522302) 0.3460 0.3597 0 (0) > 53 540000 0.0032 7.39 (522302) 0.3449 0.3567 0 (0) > 54 540000 0.0032 7.15 (522302) 0.3259 0.3550 0 (0) > 55 540000 0.0032 7.39 (522302) 0.3274 0.3578 0 (0) > 56 540000 0.0032 6.20 (387845) 0.1547 0.3660 0 (0) > 57 540000 0.0032 7.87 (367398) 0.1425 0.3847 0 (0) > 58 540000 0.0032 7.63 (347661) 0.1272 0.3915 0 (0) > 59 540000 0.0032 9.30 (347660) 0.1239 0.3552 0 (0) > 60 540000 0.0032 12.16 (152143) 0.3169 0.3407 0 (0) > 61 540000 0.0032 10.02 (152143) 0.3359 0.3369 0 (0) > 62 540000 0.0032 12.16 (152143) 0.3347 0.3341 0 (0) > 63 540000 0.0032 12.40 (152143) 0.2970 0.3460 0 (0) > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-18 5:20 ` Paul E. McKenney @ 2014-05-18 8:36 ` Mike Galbraith 2014-05-18 15:58 ` Paul E. McKenney 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2014-05-18 8:36 UTC (permalink / raw) To: paulmck Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > If you are saying that turning on nohz_full doesn't help unless you > also ensure that there is only one runnable task per CPU, I completely > agree. If you are saying something else, you lost me. ;-) Yup, that's it more or less. It's not only single task loads that could benefit from better isolation, but if isolation improving measures are tied to nohz_full, other sensitive loads will suffer if they try to use isolation improvements. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-18 8:36 ` Mike Galbraith @ 2014-05-18 15:58 ` Paul E. McKenney 2014-05-19 2:44 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Paul E. McKenney @ 2014-05-18 15:58 UTC (permalink / raw) To: Mike Galbraith Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > If you are saying that turning on nohz_full doesn't help unless you > > also ensure that there is only one runnable task per CPU, I completely > > agree. If you are saying something else, you lost me. ;-) > > Yup, that's it more or less. It's not only single task loads that could > benefit from better isolation, but if isolation improving measures are > tied to nohz_full, other sensitive loads will suffer if they try to use > isolation improvements. So you are arguing for a separate Kconfig variable that does the isolation? So that NO_HZ_FULL selects this new variable, and (for example) RCU uses this new variable to decide when to pin the grace-period kthreads onto the housekeeping CPU? Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-18 15:58 ` Paul E. McKenney @ 2014-05-19 2:44 ` Mike Galbraith 2014-05-19 5:34 ` Paul E. McKenney 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2014-05-19 2:44 UTC (permalink / raw) To: paulmck Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > also ensure that there is only one runnable task per CPU, I completely > > > agree. If you are saying something else, you lost me. ;-) > > > > Yup, that's it more or less. It's not only single task loads that could > > benefit from better isolation, but if isolation improving measures are > > tied to nohz_full, other sensitive loads will suffer if they try to use > > isolation improvements. > > So you are arguing for a separate Kconfig variable that does the isolation? > So that NO_HZ_FULL selects this new variable, and (for example) RCU > uses this new variable to decide when to pin the grace-period kthreads > onto the housekeeping CPU? I'm thinking more about runtime, but yes. The tick mode really wants to be selectable per set (in my boxen you can switch between nohz off/idle, but not yet nohz_full, that might get real interesting). You saw in my numbers that ticked is far better for the threaded rt load, but what if the total load has both sensitive rt and compute components to worry about? The rt component wants relief from the jitter that flipping the tick inflicts, but also wants as little disturbance as possible, so RCU offload and whatever other measures that are or become available are perhaps interesting to it as well. The numbers showed that here and now the two modes can work together in the same box, I can have my rt set ticking away, and other cores doing tickless compute, but enabling that via common config (distros don't want to ship many kernel flavors) has a cost to rt performance. Ideally, bean counting would be switchable too, giving all components the environment they like best. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-19 2:44 ` Mike Galbraith @ 2014-05-19 5:34 ` Paul E. McKenney 2014-05-20 14:53 ` Frederic Weisbecker 0 siblings, 1 reply; 31+ messages in thread From: Paul E. McKenney @ 2014-05-19 5:34 UTC (permalink / raw) To: Mike Galbraith Cc: Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner, fweisbec On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > also ensure that there is only one runnable task per CPU, I completely > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > benefit from better isolation, but if isolation improving measures are > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > isolation improvements. > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > uses this new variable to decide when to pin the grace-period kthreads > > onto the housekeeping CPU? > > I'm thinking more about runtime, but yes. > > The tick mode really wants to be selectable per set (in my boxen you can > switch between nohz off/idle, but not yet nohz_full, that might get real > interesting). You saw in my numbers that ticked is far better for the > threaded rt load, but what if the total load has both sensitive rt and > compute components to worry about? The rt component wants relief from > the jitter that flipping the tick inflicts, but also wants as little > disturbance as possible, so RCU offload and whatever other measures that > are or become available are perhaps interesting to it as well. The > numbers showed that here and now the two modes can work together in the > same box, I can have my rt set ticking away, and other cores doing > tickless compute, but enabling that via common config (distros don't > want to ship many kernel flavors) has a cost to rt performance. > > Ideally, bean counting would be switchable too, giving all components > the environment they like best. Sounds like a question for Frederic (now CCed). ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-19 5:34 ` Paul E. McKenney @ 2014-05-20 14:53 ` Frederic Weisbecker 2014-05-20 15:53 ` Paul E. McKenney 2014-05-21 3:52 ` Mike Galbraith 0 siblings, 2 replies; 31+ messages in thread From: Frederic Weisbecker @ 2014-05-20 14:53 UTC (permalink / raw) To: Paul E. McKenney Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote: > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > > also ensure that there is only one runnable task per CPU, I completely > > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > > benefit from better isolation, but if isolation improving measures are > > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > > isolation improvements. > > > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > > uses this new variable to decide when to pin the grace-period kthreads > > > onto the housekeeping CPU? > > > > I'm thinking more about runtime, but yes. > > > > The tick mode really wants to be selectable per set (in my boxen you can > > switch between nohz off/idle, but not yet nohz_full, that might get real > > interesting). You saw in my numbers that ticked is far better for the > > threaded rt load, but what if the total load has both sensitive rt and > > compute components to worry about? The rt component wants relief from > > the jitter that flipping the tick inflicts, but also wants as little > > disturbance as possible, so RCU offload and whatever other measures that > > are or become available are perhaps interesting to it as well. The > > numbers showed that here and now the two modes can work together in the > > same box, I can have my rt set ticking away, and other cores doing > > tickless compute, but enabling that via common config (distros don't > > want to ship many kernel flavors) has a cost to rt performance. > > > > Ideally, bean counting would be switchable too, giving all components > > the environment they like best. > > Sounds like a question for Frederic (now CCed). ;-) I'm not sure that I really understand what you want here. The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks is actually off by default. This is only overriden by "nohz_full=" boot parameter. Now if what you need is to enable or disable it at runtime instead of boottime, I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched and RCU). I've already been eyed by vulturous frozen sharks flying in circles above me lately after a few overengineering visions. And given that the full nohz code is still in a baby shape, it's probably not the right time to expand it that way. I haven't even yet heard about users who crossed the testing stage of full nohz. We'll probably extend it that way in the future. But likely not in a near future. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 14:53 ` Frederic Weisbecker @ 2014-05-20 15:53 ` Paul E. McKenney 2014-05-20 16:24 ` Frederic Weisbecker 2014-05-21 4:18 ` Mike Galbraith 2014-05-21 3:52 ` Mike Galbraith 1 sibling, 2 replies; 31+ messages in thread From: Paul E. McKenney @ 2014-05-20 15:53 UTC (permalink / raw) To: Frederic Weisbecker Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote: > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > > > also ensure that there is only one runnable task per CPU, I completely > > > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > > > benefit from better isolation, but if isolation improving measures are > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > > > isolation improvements. > > > > > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > > > uses this new variable to decide when to pin the grace-period kthreads > > > > onto the housekeeping CPU? > > > > > > I'm thinking more about runtime, but yes. > > > > > > The tick mode really wants to be selectable per set (in my boxen you can > > > switch between nohz off/idle, but not yet nohz_full, that might get real > > > interesting). You saw in my numbers that ticked is far better for the > > > threaded rt load, but what if the total load has both sensitive rt and > > > compute components to worry about? The rt component wants relief from > > > the jitter that flipping the tick inflicts, but also wants as little > > > disturbance as possible, so RCU offload and whatever other measures that > > > are or become available are perhaps interesting to it as well. The > > > numbers showed that here and now the two modes can work together in the > > > same box, I can have my rt set ticking away, and other cores doing > > > tickless compute, but enabling that via common config (distros don't > > > want to ship many kernel flavors) has a cost to rt performance. > > > > > > Ideally, bean counting would be switchable too, giving all components > > > the environment they like best. > > > > Sounds like a question for Frederic (now CCed). ;-) > > I'm not sure that I really understand what you want here. > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks > is actually off by default. This is only overriden by "nohz_full=" boot parameter. If I understand correctly, if there is no nohz_full= boot parameter, then the context-tracking code takes the early exit via the context_tracking_is_enabled() check in context_tracking_user_enter(). I would not expect this to cause much in the way of syscall performance degradation. However, it looks like having even one CPU in nohz_full mode causes all CPUs to enable context tracking. My guess is that Mike wants to have (say) half of his CPUs running nohz_full, and the other half having fast system calls. So my guess also is that he would like some way of having the non-nohz_full CPUs to opt out of the context-tracking overhead, including the memory barriers and atomic ops in rcu_user_enter() and rcu_user_exit(). ;-) > Now if what you need is to enable or disable it at runtime instead of boottime, > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched > and RCU). What Frederic said! Making RCU deal with this is possible, but a bit on the complicated side. Given that I haven't heard too many people complaining that RCU is too simple, I would like to opt out of runtime changes to the nohz_full mask. > I've already been eyed by vulturous frozen sharks flying in circles above me lately > after a few overengineering visions. Nothing like the icy glare of a frozen shark, is there? ;-) > And given that the full nohz code is still in a baby shape, it's probably not the right > time to expand it that way. I haven't even yet heard about users who crossed the testing > stage of full nohz. > > We'll probably extend it that way in the future. But likely not in a near future. My guess is that Mike would be OK with making nohz_full choice of CPUs still at boot time, but that he would like the CPUs that are not to be in nohz_full state be able to opt out of the context-tracking overhead. Mike, please let us all know if I am misunderstanding what you are looking for. Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 15:53 ` Paul E. McKenney @ 2014-05-20 16:24 ` Frederic Weisbecker 2014-05-20 16:36 ` Peter Zijlstra 2014-05-20 17:20 ` Paul E. McKenney 2014-05-21 4:18 ` Mike Galbraith 1 sibling, 2 replies; 31+ messages in thread From: Frederic Weisbecker @ 2014-05-20 16:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote: > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: > > I'm not sure that I really understand what you want here. > > > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks > > is actually off by default. This is only overriden by "nohz_full=" boot parameter. > > If I understand correctly, if there is no nohz_full= boot parameter, > then the context-tracking code takes the early exit via the > context_tracking_is_enabled() check in context_tracking_user_enter(). Exactly. It's even jump labeled. So it should, in the better arch support case, resume to a single unconditional jump when it's off. > I would not expect this to cause much in the way of syscall performance > degradation. Now the jump label concern all cases but syscalls (exceptions and irq). Syscalls are even better off-case optimized with a TIF_NOHZ flag. So it goes down to the slow path all-in-one condition. At least in x86. > However, it looks like having even one CPU in nohz_full > mode causes all CPUs to enable context tracking. True unfortunately. It's necessary to track down syscalls and exceptions entry exit across CPUs. So if CPU 1 is full nohz and a task enters in userspace on CPU 0 and then migrates to CPU 1, we must know there that it's resuming in userspace in order to stop the tick confidently. So CPU 0 must do context tracking as well. Of course one can argue that we can find out that the task is resuming in userspace from CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe solution for that. This is because probing on user/kernel boundaries can only be done in the soft way, throught explicit function calls. So there is an inevitable shift between soft and hard boundaries, between what we probe and what we can guess. > > My guess is that Mike wants to have (say) half of his CPUs running > nohz_full, and the other half having fast system calls. So my guess > also is that he would like some way of having the non-nohz_full CPUs > to opt out of the context-tracking overhead, including the memory > barriers and atomic ops in rcu_user_enter() and rcu_user_exit(). ;-) I see. So we could possibly restrict the context tracking to a bunch of CPUs but only iff the tasks running there can't run on non-tracking CPUs. Ah one possible thing is to rely on the NOHZ flag for that and check which task needs to be tracked. > > Now if what you need is to enable or disable it at runtime instead of boottime, > > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched > > and RCU). > > What Frederic said! Making RCU deal with this is possible, but a bit on > the complicated side. Given that I haven't heard too many people complaining > that RCU is too simple, I would like to opt out of runtime changes to the > nohz_full mask. Agreed. > > > I've already been eyed by vulturous frozen sharks flying in circles above me lately > > after a few overengineering visions. > > Nothing like the icy glare of a frozen shark, is there? ;-) I think they were even three-eyed!!! > > > And given that the full nohz code is still in a baby shape, it's probably not the right > > time to expand it that way. I haven't even yet heard about users who crossed the testing > > stage of full nohz. > > > > We'll probably extend it that way in the future. But likely not in a near future. > > My guess is that Mike would be OK with making nohz_full choice of CPUs > still at boot time, but that he would like the CPUs that are not to be > in nohz_full state be able to opt out of the context-tracking overhead. Ok that might be possible. Although still require a bit of complication. Lets wait for Mike input. Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 16:24 ` Frederic Weisbecker @ 2014-05-20 16:36 ` Peter Zijlstra 2014-05-20 17:20 ` Paul E. McKenney 1 sibling, 0 replies; 31+ messages in thread From: Peter Zijlstra @ 2014-05-20 16:36 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt, Thomas Gleixner On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote: > Of course one can argue that we can find out that the task is resuming in userspace from > CPU 0 scheduler entry without the need for previous context tracking, but I couldn't find safe > solution for that. This is because probing on user/kernel boundaries can only be done > in the soft way, throught explicit function calls. So there is an inevitable shift > between soft and hard boundaries, between what we probe and what we can guess. you can hook into set_task_cpu(), not sure its going to be pretty, but that is _the_ place to hook migration related nonsense. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 16:24 ` Frederic Weisbecker 2014-05-20 16:36 ` Peter Zijlstra @ 2014-05-20 17:20 ` Paul E. McKenney 2014-05-21 4:29 ` Mike Galbraith 1 sibling, 1 reply; 31+ messages in thread From: Paul E. McKenney @ 2014-05-20 17:20 UTC (permalink / raw) To: Frederic Weisbecker Cc: Mike Galbraith, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote: > On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote: > > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: [ . . . ] > > > We'll probably extend it that way in the future. But likely not in a near future. > > > > My guess is that Mike would be OK with making nohz_full choice of CPUs > > still at boot time, but that he would like the CPUs that are not to be > > in nohz_full state be able to opt out of the context-tracking overhead. > > Ok that might be possible. Although still require a bit of complication. > Lets wait for Mike input. Sounds good! Mike, would this do what you need? Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 17:20 ` Paul E. McKenney @ 2014-05-21 4:29 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2014-05-21 4:29 UTC (permalink / raw) To: paulmck Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, 2014-05-20 at 10:20 -0700, Paul E. McKenney wrote: > On Tue, May 20, 2014 at 06:24:36PM +0200, Frederic Weisbecker wrote: > > On Tue, May 20, 2014 at 08:53:24AM -0700, Paul E. McKenney wrote: > > > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: > > [ . . . ] > > > > > We'll probably extend it that way in the future. But likely not in a near future. > > > > > > My guess is that Mike would be OK with making nohz_full choice of CPUs > > > still at boot time, but that he would like the CPUs that are not to be > > > in nohz_full state be able to opt out of the context-tracking overhead. > > > > Ok that might be possible. Although still require a bit of complication. > > Lets wait for Mike input. > > Sounds good! > > Mike, would this do what you need? I don't _have_ a here and now need at all, I'm just looking at the possibilities. For the users I'm aware of here and now, I'm pretty sure they'd be tickled pink with it as it sits ('course tickled pink will quickly become "I see a 1.073us perturbation once every three weeks"). -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 15:53 ` Paul E. McKenney 2014-05-20 16:24 ` Frederic Weisbecker @ 2014-05-21 4:18 ` Mike Galbraith 2014-05-21 12:03 ` Paul E. McKenney 1 sibling, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2014-05-21 4:18 UTC (permalink / raw) To: paulmck Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, 2014-05-20 at 08:53 -0700, Paul E. McKenney wrote: > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: > > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote: > > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > > > > also ensure that there is only one runnable task per CPU, I completely > > > > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > > > > benefit from better isolation, but if isolation improving measures are > > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > > > > isolation improvements. > > > > > > > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > > > > uses this new variable to decide when to pin the grace-period kthreads > > > > > onto the housekeeping CPU? > > > > > > > > I'm thinking more about runtime, but yes. > > > > > > > > The tick mode really wants to be selectable per set (in my boxen you can > > > > switch between nohz off/idle, but not yet nohz_full, that might get real > > > > interesting). You saw in my numbers that ticked is far better for the > > > > threaded rt load, but what if the total load has both sensitive rt and > > > > compute components to worry about? The rt component wants relief from > > > > the jitter that flipping the tick inflicts, but also wants as little > > > > disturbance as possible, so RCU offload and whatever other measures that > > > > are or become available are perhaps interesting to it as well. The > > > > numbers showed that here and now the two modes can work together in the > > > > same box, I can have my rt set ticking away, and other cores doing > > > > tickless compute, but enabling that via common config (distros don't > > > > want to ship many kernel flavors) has a cost to rt performance. > > > > > > > > Ideally, bean counting would be switchable too, giving all components > > > > the environment they like best. > > > > > > Sounds like a question for Frederic (now CCed). ;-) > > > > I'm not sure that I really understand what you want here. > > > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks > > is actually off by default. This is only overriden by "nohz_full=" boot parameter. > > If I understand correctly, if there is no nohz_full= boot parameter, > then the context-tracking code takes the early exit via the > context_tracking_is_enabled() check in context_tracking_user_enter(). > I would not expect this to cause much in the way of syscall performance > degradation. However, it looks like having even one CPU in nohz_full > mode causes all CPUs to enable context tracking. > > My guess is that Mike wants to have (say) half of his CPUs running > nohz_full, and the other half having fast system calls. So my guess > also is that he would like some way of having the non-nohz_full CPUs > to opt out of the context-tracking overhead, including the memory > barriers and atomic ops in rcu_user_enter() and rcu_user_exit(). ;-) Bingo. > > Now if what you need is to enable or disable it at runtime instead of boottime, > > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched > > and RCU). > > What Frederic said! Making RCU deal with this is possible, but a bit on > the complicated side. Given that I haven't heard too many people complaining > that RCU is too simple, I would like to opt out of runtime changes to the > nohz_full mask. > > > I've already been eyed by vulturous frozen sharks flying in circles above me lately > > after a few overengineering visions. > > Nothing like the icy glare of a frozen shark, is there? ;-) > > > And given that the full nohz code is still in a baby shape, it's probably not the right > > time to expand it that way. I haven't even yet heard about users who crossed the testing > > stage of full nohz. > > > > We'll probably extend it that way in the future. But likely not in a near future. > > My guess is that Mike would be OK with making nohz_full choice of CPUs > still at boot time, but that he would like the CPUs that are not to be > in nohz_full state be able to opt out of the context-tracking overhead. > > Mike, please let us all know if I am misunderstanding what you are > looking for. Yup, exactly. As it sits, you couldn't possible ship nohz_full out to the real world in any other form than a specialty kernel. There's no doubt in my mind that there are users out there who would love to have high performance rt and compute in the same box though. I can imagine them lurking here and slobbering profusely ;-) -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-21 4:18 ` Mike Galbraith @ 2014-05-21 12:03 ` Paul E. McKenney 0 siblings, 0 replies; 31+ messages in thread From: Paul E. McKenney @ 2014-05-21 12:03 UTC (permalink / raw) To: Mike Galbraith Cc: Frederic Weisbecker, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Wed, May 21, 2014 at 06:18:01AM +0200, Mike Galbraith wrote: > On Tue, 2014-05-20 at 08:53 -0700, Paul E. McKenney wrote: > > On Tue, May 20, 2014 at 04:53:52PM +0200, Frederic Weisbecker wrote: > > > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote: > > > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > > > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > > > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > > > > > also ensure that there is only one runnable task per CPU, I completely > > > > > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > > > > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > > > > > benefit from better isolation, but if isolation improving measures are > > > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > > > > > isolation improvements. > > > > > > > > > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > > > > > uses this new variable to decide when to pin the grace-period kthreads > > > > > > onto the housekeeping CPU? > > > > > > > > > > I'm thinking more about runtime, but yes. > > > > > > > > > > The tick mode really wants to be selectable per set (in my boxen you can > > > > > switch between nohz off/idle, but not yet nohz_full, that might get real > > > > > interesting). You saw in my numbers that ticked is far better for the > > > > > threaded rt load, but what if the total load has both sensitive rt and > > > > > compute components to worry about? The rt component wants relief from > > > > > the jitter that flipping the tick inflicts, but also wants as little > > > > > disturbance as possible, so RCU offload and whatever other measures that > > > > > are or become available are perhaps interesting to it as well. The > > > > > numbers showed that here and now the two modes can work together in the > > > > > same box, I can have my rt set ticking away, and other cores doing > > > > > tickless compute, but enabling that via common config (distros don't > > > > > want to ship many kernel flavors) has a cost to rt performance. > > > > > > > > > > Ideally, bean counting would be switchable too, giving all components > > > > > the environment they like best. > > > > > > > > Sounds like a question for Frederic (now CCed). ;-) > > > > > > I'm not sure that I really understand what you want here. > > > > > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks > > > is actually off by default. This is only overriden by "nohz_full=" boot parameter. > > > > If I understand correctly, if there is no nohz_full= boot parameter, > > then the context-tracking code takes the early exit via the > > context_tracking_is_enabled() check in context_tracking_user_enter(). > > I would not expect this to cause much in the way of syscall performance > > degradation. However, it looks like having even one CPU in nohz_full > > mode causes all CPUs to enable context tracking. > > > > My guess is that Mike wants to have (say) half of his CPUs running > > nohz_full, and the other half having fast system calls. So my guess > > also is that he would like some way of having the non-nohz_full CPUs > > to opt out of the context-tracking overhead, including the memory > > barriers and atomic ops in rcu_user_enter() and rcu_user_exit(). ;-) > > Bingo. > > > > Now if what you need is to enable or disable it at runtime instead of boottime, > > > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched > > > and RCU). > > > > What Frederic said! Making RCU deal with this is possible, but a bit on > > the complicated side. Given that I haven't heard too many people complaining > > that RCU is too simple, I would like to opt out of runtime changes to the > > nohz_full mask. > > > > > I've already been eyed by vulturous frozen sharks flying in circles above me lately > > > after a few overengineering visions. > > > > Nothing like the icy glare of a frozen shark, is there? ;-) > > > > > And given that the full nohz code is still in a baby shape, it's probably not the right > > > time to expand it that way. I haven't even yet heard about users who crossed the testing > > > stage of full nohz. > > > > > > We'll probably extend it that way in the future. But likely not in a near future. > > > > My guess is that Mike would be OK with making nohz_full choice of CPUs > > still at boot time, but that he would like the CPUs that are not to be > > in nohz_full state be able to opt out of the context-tracking overhead. > > > > Mike, please let us all know if I am misunderstanding what you are > > looking for. > > Yup, exactly. As it sits, you couldn't possible ship nohz_full out to > the real world in any other form than a specialty kernel. There's no > doubt in my mind that there are users out there who would love to have > high performance rt and compute in the same box though. I can imagine > them lurking here and slobbering profusely ;-) I think that shipping nohz_full out to the real world is already happening, but that turning on nohz_full at boot time is expected to clobber syscall performance globally. Still, I can see the attraction of avoiding clobbering the syscall performance on the non-nohz_ful CPUs when nohz_full is enabled on only some of the CPUs. Thanx, Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-20 14:53 ` Frederic Weisbecker 2014-05-20 15:53 ` Paul E. McKenney @ 2014-05-21 3:52 ` Mike Galbraith 1 sibling, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2014-05-21 3:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner On Tue, 2014-05-20 at 16:53 +0200, Frederic Weisbecker wrote: > On Sun, May 18, 2014 at 10:34:01PM -0700, Paul E. McKenney wrote: > > On Mon, May 19, 2014 at 04:44:41AM +0200, Mike Galbraith wrote: > > > On Sun, 2014-05-18 at 08:58 -0700, Paul E. McKenney wrote: > > > > On Sun, May 18, 2014 at 10:36:41AM +0200, Mike Galbraith wrote: > > > > > On Sat, 2014-05-17 at 22:20 -0700, Paul E. McKenney wrote: > > > > > > > > > > > If you are saying that turning on nohz_full doesn't help unless you > > > > > > also ensure that there is only one runnable task per CPU, I completely > > > > > > agree. If you are saying something else, you lost me. ;-) > > > > > > > > > > Yup, that's it more or less. It's not only single task loads that could > > > > > benefit from better isolation, but if isolation improving measures are > > > > > tied to nohz_full, other sensitive loads will suffer if they try to use > > > > > isolation improvements. > > > > > > > > So you are arguing for a separate Kconfig variable that does the isolation? > > > > So that NO_HZ_FULL selects this new variable, and (for example) RCU > > > > uses this new variable to decide when to pin the grace-period kthreads > > > > onto the housekeeping CPU? > > > > > > I'm thinking more about runtime, but yes. > > > > > > The tick mode really wants to be selectable per set (in my boxen you can > > > switch between nohz off/idle, but not yet nohz_full, that might get real > > > interesting). You saw in my numbers that ticked is far better for the > > > threaded rt load, but what if the total load has both sensitive rt and > > > compute components to worry about? The rt component wants relief from > > > the jitter that flipping the tick inflicts, but also wants as little > > > disturbance as possible, so RCU offload and whatever other measures that > > > are or become available are perhaps interesting to it as well. The > > > numbers showed that here and now the two modes can work together in the > > > same box, I can have my rt set ticking away, and other cores doing > > > tickless compute, but enabling that via common config (distros don't > > > want to ship many kernel flavors) has a cost to rt performance. > > > > > > Ideally, bean counting would be switchable too, giving all components > > > the environment they like best. > > > > Sounds like a question for Frederic (now CCed). ;-) > > I'm not sure that I really understand what you want here. > > The current state of the art is that when you enable CONFIG_NO_HZ_FULL=y, full dynticks > is actually off by default. This is only overriden by "nohz_full=" boot parameter. > > Now if what you need is to enable or disable it at runtime instead of boottime, > I must warn you that this is going to complicate the nohz code a lot (and also perhaps sched > and RCU). Yeah, that would be the most flexible (not to mention invasive). That said, users of nohz_full are likely gonna be very few and far between, so maybe no big deal. I'm just looking at it from a distro perspective, what can it, can it not do, what does it cost, to see how it would best be served once baked. > I've already been eyed by vulturous frozen sharks flying in circles above me lately > after a few overengineering visions. :) > And given that the full nohz code is still in a baby shape, it's probably not the right > time to expand it that way. I haven't even yet heard about users who crossed the testing > stage of full nohz. > > We'll probably extend it that way in the future. But likely not in a near future. Yeah, understood, I'm just measuring and pondering potentials. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-15 3:18 ` Mike Galbraith 2014-05-15 14:45 ` Paul E. McKenney 2014-05-18 4:22 ` Mike Galbraith @ 2014-05-19 10:54 ` Peter Zijlstra 2 siblings, 0 replies; 31+ messages in thread From: Peter Zijlstra @ 2014-05-19 10:54 UTC (permalink / raw) To: Mike Galbraith Cc: paulmck, Paul Gortmaker, linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 699 bytes --] On Thu, May 15, 2014 at 05:18:51AM +0200, Mike Galbraith wrote: > On Wed, 2014-05-14 at 08:44 -0700, Paul E. McKenney wrote: > > > In practice, not sure how much testing CONFIG_NO_HZ_FULL=y has received > > for -rt kernels in production environments. > > I took 3.14-rt out for a quick spin on my 64 core box, it didn't work at > all with 60 cores isolated. I didn't have time to rummage, but it looks > like there are still bugs to squash. > > Biggest problem with CONFIG_NO_HZ_FULL is the price tag. It just raped > fast mover performance last time I measured. Syscall entry and exit times are through the roof with it. So anything doing loads of syscalls will suffer badly. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker 2014-05-14 15:44 ` Paul E. McKenney @ 2014-05-19 12:40 ` Peter Zijlstra 2014-05-22 19:40 ` Paul Gortmaker 2014-11-27 11:21 ` Wanpeng Li 2 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2014-05-19 12:40 UTC (permalink / raw) To: Paul Gortmaker Cc: linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt, Thomas Gleixner, Paul E. McKenney [-- Attachment #1: Type: text/plain, Size: 4155 bytes --] On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 > ("sched: rt-group: smp balancing") the concept of borrowing per > cpu rt_runtime from one core to another was introduced. > > However, this prevents the RT throttling message from ever being > emitted when someone does a common (but mistaken) attempt at > using too much CPU in RT context. Consider the following test: So the alternative approach is something like the below, where we will not let it borrow more than the global bandwidth per cpu. This whole sharing thing is completely fail anyway, but I really wouldn't know what else to do and keep allowing RT tasks to set random cpu affinities. --- --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7386,10 +7386,59 @@ static int __rt_schedulable(struct task_ return ret; } +/* + * ret := (a * b) / d + */ +static u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) +{ + /* + * Compute the 128bit product: + * a * b -> + * [ a = (ah * 2^32 + al), b = (bh * 2^32 + bl) ] + * -> (ah * bh) * 2^64 + (ah * bl + al * bh) * 2^32 + al * bl + */ + u32 ah = (a >> 32); + u32 bh = (b >> 32); + u32 al = a; + u32 bl = b; + + u64 mh, mm, ml; + + mh = (u64)ah * bh; + mm = (u64)ah * bl + (u64)al * bh; + ml = (u64)al * bl; + + mh += mm >> 32; + mm <<= 32; + + ml += mm; + if (ml < mm) /* overflow */ + mh++; + + /* + * Reduce the 128bit result to fit in a 64bit dividend: + * m / d -> (m / 2^n) / (d / 2^n) + */ + while (mh) { + ml >>= 1; + if (mh & 1) + ml |= 1ULL << 63; + mh >>= 1; + d >>= 1; + } + + if (unlikely(!d)) + return ml; + + return div64_u64(ml, d); +} + static int tg_set_rt_bandwidth(struct task_group *tg, u64 rt_period, u64 rt_runtime) { int i, err = 0; + u64 g_period = global_rt_period(); + u64 g_runtime = global_rt_runtime(); mutex_lock(&rt_constraints_mutex); read_lock(&tasklist_lock); @@ -7400,6 +7449,9 @@ static int tg_set_rt_bandwidth(struct ta raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock); tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period); tg->rt_bandwidth.rt_runtime = rt_runtime; + tg->rt_bandwidth.rt_max_runtime = (g_runtime == RUNTIME_INF) ? + rt_period : + mul_u64_u64_div_u64(rt_period, g_runtime, g_period); for_each_possible_cpu(i) { struct rt_rq *rt_rq = tg->rt_rq[i]; @@ -7577,6 +7629,7 @@ static int sched_rt_global_validate(void static void sched_rt_do_global(void) { def_rt_bandwidth.rt_runtime = global_rt_runtime(); + def_rt_bandwidth.rt_max_runtime = global_rt_runtime(); def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period()); } --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -614,12 +614,12 @@ static int do_balance_runtime(struct rt_ struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); struct root_domain *rd = rq_of_rt_rq(rt_rq)->rd; int i, weight, more = 0; - u64 rt_period; + u64 rt_max_runtime; weight = cpumask_weight(rd->span); raw_spin_lock(&rt_b->rt_runtime_lock); - rt_period = ktime_to_ns(rt_b->rt_period); + rt_max_runtime = rt_b->rt_max_runtime; for_each_cpu(i, rd->span) { struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); s64 diff; @@ -643,12 +643,12 @@ static int do_balance_runtime(struct rt_ diff = iter->rt_runtime - iter->rt_time; if (diff > 0) { diff = div_u64((u64)diff, weight); - if (rt_rq->rt_runtime + diff > rt_period) - diff = rt_period - rt_rq->rt_runtime; + if (rt_rq->rt_runtime + diff > rt_max_runtime) + diff = rt_max_runtime - rt_rq->rt_runtime; iter->rt_runtime -= diff; rt_rq->rt_runtime += diff; more = 1; - if (rt_rq->rt_runtime == rt_period) { + if (rt_rq->rt_runtime == rt_max_runtime) { raw_spin_unlock(&iter->rt_runtime_lock); break; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -124,6 +124,7 @@ struct rt_bandwidth { raw_spinlock_t rt_runtime_lock; ktime_t rt_period; u64 rt_runtime; + u64 rt_max_runtime; struct hrtimer rt_period_timer; }; /* [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-19 12:40 ` Peter Zijlstra @ 2014-05-22 19:40 ` Paul Gortmaker 0 siblings, 0 replies; 31+ messages in thread From: Paul Gortmaker @ 2014-05-22 19:40 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-rt-users, Ingo Molnar, Steven Rostedt, Thomas Gleixner, Paul E. McKenney On 14-05-19 08:40 AM, Peter Zijlstra wrote: > On Wed, May 14, 2014 at 11:08:35AM -0400, Paul Gortmaker wrote: >> As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 >> ("sched: rt-group: smp balancing") the concept of borrowing per >> cpu rt_runtime from one core to another was introduced. >> >> However, this prevents the RT throttling message from ever being >> emitted when someone does a common (but mistaken) attempt at >> using too much CPU in RT context. Consider the following test: > > > So the alternative approach is something like the below, where we will > not let it borrow more than the global bandwidth per cpu. > > This whole sharing thing is completely fail anyway, but I really > wouldn't know what else to do and keep allowing RT tasks to set random > cpu affinities. So, for the record, this does seem to work, in the sense that the original test of: echo "main() {for(;;);}" > full_load.c gcc full_load.c -o full_load taskset -c 1 ./full_load & chrt -r -p 80 `pidof full_load` will emit the sched delayed throttling message instead of the less informative (and 20s delayed) RCU stall. Which IMHO is a win in terms of being more friendly to the less informed users out there. I'd re-tested it on today's linux-next tree, with RT_GROUP_SCHED off. The downside is that we get another tuning knob that will inevitably end up in /proc/sys/kernel/ and we'll need to explain somewhere how the new max_runtime relates to the existing rt_runtime and rt_period. I'm still unsure what the best solution for mainline is. Clearly just defaulting the sched feat to false is the simplest, and given your description of it as "fail" perhaps that does makes sense. :) Paul. -- > > > --- > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7386,10 +7386,59 @@ static int __rt_schedulable(struct task_ > return ret; > } > > +/* > + * ret := (a * b) / d > + */ > +static u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) > +{ > + /* > + * Compute the 128bit product: > + * a * b -> > + * [ a = (ah * 2^32 + al), b = (bh * 2^32 + bl) ] > + * -> (ah * bh) * 2^64 + (ah * bl + al * bh) * 2^32 + al * bl > + */ > + u32 ah = (a >> 32); > + u32 bh = (b >> 32); > + u32 al = a; > + u32 bl = b; > + > + u64 mh, mm, ml; > + > + mh = (u64)ah * bh; > + mm = (u64)ah * bl + (u64)al * bh; > + ml = (u64)al * bl; > + > + mh += mm >> 32; > + mm <<= 32; > + > + ml += mm; > + if (ml < mm) /* overflow */ > + mh++; > + > + /* > + * Reduce the 128bit result to fit in a 64bit dividend: > + * m / d -> (m / 2^n) / (d / 2^n) > + */ > + while (mh) { > + ml >>= 1; > + if (mh & 1) > + ml |= 1ULL << 63; > + mh >>= 1; > + d >>= 1; > + } > + > + if (unlikely(!d)) > + return ml; > + > + return div64_u64(ml, d); > +} > + > static int tg_set_rt_bandwidth(struct task_group *tg, > u64 rt_period, u64 rt_runtime) > { > int i, err = 0; > + u64 g_period = global_rt_period(); > + u64 g_runtime = global_rt_runtime(); > > mutex_lock(&rt_constraints_mutex); > read_lock(&tasklist_lock); > @@ -7400,6 +7449,9 @@ static int tg_set_rt_bandwidth(struct ta > raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock); > tg->rt_bandwidth.rt_period = ns_to_ktime(rt_period); > tg->rt_bandwidth.rt_runtime = rt_runtime; > + tg->rt_bandwidth.rt_max_runtime = (g_runtime == RUNTIME_INF) ? > + rt_period : > + mul_u64_u64_div_u64(rt_period, g_runtime, g_period); > > for_each_possible_cpu(i) { > struct rt_rq *rt_rq = tg->rt_rq[i]; > @@ -7577,6 +7629,7 @@ static int sched_rt_global_validate(void > static void sched_rt_do_global(void) > { > def_rt_bandwidth.rt_runtime = global_rt_runtime(); > + def_rt_bandwidth.rt_max_runtime = global_rt_runtime(); > def_rt_bandwidth.rt_period = ns_to_ktime(global_rt_period()); > } > > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -614,12 +614,12 @@ static int do_balance_runtime(struct rt_ > struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); > struct root_domain *rd = rq_of_rt_rq(rt_rq)->rd; > int i, weight, more = 0; > - u64 rt_period; > + u64 rt_max_runtime; > > weight = cpumask_weight(rd->span); > > raw_spin_lock(&rt_b->rt_runtime_lock); > - rt_period = ktime_to_ns(rt_b->rt_period); > + rt_max_runtime = rt_b->rt_max_runtime; > for_each_cpu(i, rd->span) { > struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i); > s64 diff; > @@ -643,12 +643,12 @@ static int do_balance_runtime(struct rt_ > diff = iter->rt_runtime - iter->rt_time; > if (diff > 0) { > diff = div_u64((u64)diff, weight); > - if (rt_rq->rt_runtime + diff > rt_period) > - diff = rt_period - rt_rq->rt_runtime; > + if (rt_rq->rt_runtime + diff > rt_max_runtime) > + diff = rt_max_runtime - rt_rq->rt_runtime; > iter->rt_runtime -= diff; > rt_rq->rt_runtime += diff; > more = 1; > - if (rt_rq->rt_runtime == rt_period) { > + if (rt_rq->rt_runtime == rt_max_runtime) { > raw_spin_unlock(&iter->rt_runtime_lock); > break; > } > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -124,6 +124,7 @@ struct rt_bandwidth { > raw_spinlock_t rt_runtime_lock; > ktime_t rt_period; > u64 rt_runtime; > + u64 rt_max_runtime; > struct hrtimer rt_period_timer; > }; > /* > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] sched/rt: don't try to balance rt_runtime when it is futile 2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker 2014-05-14 15:44 ` Paul E. McKenney 2014-05-19 12:40 ` Peter Zijlstra @ 2014-11-27 11:21 ` Wanpeng Li 2 siblings, 0 replies; 31+ messages in thread From: Wanpeng Li @ 2014-11-27 11:21 UTC (permalink / raw) To: Paul Gortmaker, linux-kernel Cc: linux-rt-users, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Paul E. McKenney Hi Paul, On 5/14/14, 11:08 PM, Paul Gortmaker wrote: > As of the old commit ac086bc22997a2be24fc40fc8d46522fe7e03d11 > ("sched: rt-group: smp balancing") the concept of borrowing per > cpu rt_runtime from one core to another was introduced. > > However, this prevents the RT throttling message from ever being > emitted when someone does a common (but mistaken) attempt at > using too much CPU in RT context. Consider the following test: > > echo "main() {for(;;);}" > full_load.c > gcc full_load.c -o full_load > taskset -c 1 ./full_load & > chrt -r -p 80 `pidof full_load` I try this on 3.18-rc6 w/ CONFIG_RCU_CPU_STALL_TIMEOUT=60 and SCHED_FEAT(RT_RUNTIME_SHARE, true), however I don't see rcu stall warning, where I miss? Regards, Wanpeng Li > When run on x86_64 defconfig, what happens is as follows: > > -task runs on core1 for 95% of an rt_period as documented in > the file Documentation/scheduler/sched-rt-group.txt > > -at 95%, the code in balance_runtime sees this threshold and > calls do_balance_runtime() > > -do_balance_runtime sees that core 1 is in need, and does this: > --------------- > if (rt_rq->rt_runtime + diff > rt_period) > diff = rt_period - rt_rq->rt_runtime; > iter->rt_runtime -= diff; > rt_rq->rt_runtime += diff; > --------------- > which extends core1's rt_runtime by 5%, making it 100% of rt_period > by stealing 5% from core0 (or possibly some other core). > > However, the next time core1's rt_rq enters sched_rt_runtime_exceeded(), > we hit this near the top of that function: > --------------- > if (runtime >= sched_rt_period(rt_rq)) > return 0; > --------------- > and hence we'll _never_ look at/set any of the throttling checks and > messages in sched_rt_runtime_exceeded(). Instead, we will happily > plod along for CONFIG_RCU_CPU_STALL_TIMEOUT seconds, at which point > the RCU subsystem will get angry and trigger an NMI in response to > what it rightly sees as a WTF situation. > > Granted, there are lots of ways you can do bad things to yourself with > RT, but in the current zeitgeist of multicore systems with people > dedicating individual cores to individual tasks, I'd say the above is > common enough that we should react to it sensibly, and an RCU stall > really doesn't translate well to an end user vs a simple message that > says "throttling activated". > > One way to get the throttle message instead of the ambiguous and lengthy > NMI triggered all core backtrace of the RCU stall is to change the > SCHED_FEAT(RT_RUNTIME_SHARE, true) to false. One could make a good > case for this being the default for the out-of-tree preempt-rt series, > since folks using that are more apt to be manually tuning the system > and won't want an invisible hand coming in and making changes. > > However, in mainline, where it is more likely that there will be > n+x (x>0) RT tasks on an n core system, we can leave the sharing on, > and still avoid the RCU stalls by noting that there is no point in > trying to balance when there are no tasks to migrate, or only a > single RT task is present. Inflating the rt_runtime does nothing > in this case other than defeat sched_rt_runtime_exceeded(). > > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> > --- > > [I'd mentioned a similar use case here: https://lkml.org/lkml/2013/3/6/338 > and tglx asked why they wouldn't see the throttle message; it is only > now that I had a chance to dig in and figure out why. Oh, and the patch > is against linux-next, in case that matters...] > > kernel/sched/rt.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index ea4d500..698aac9 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -774,6 +774,15 @@ static int balance_runtime(struct rt_rq *rt_rq) > if (!sched_feat(RT_RUNTIME_SHARE)) > return more; > > + /* > + * Stealing from another core won't help us at all if > + * we have nothing to migrate over there, or only one > + * task that is running up all the rt_time. In fact it > + * will just inhibit the throttling message in that case. > + */ > + if (!rt_rq->rt_nr_migratory || rt_rq->rt_nr_total == 1) > + return more; > + > if (rt_rq->rt_time > rt_rq->rt_runtime) { > raw_spin_unlock(&rt_rq->rt_runtime_lock); > more = do_balance_runtime(rt_rq); ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2014-11-27 15:31 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-14 15:08 [PATCH] sched/rt: don't try to balance rt_runtime when it is futile Paul Gortmaker 2014-05-14 15:44 ` Paul E. McKenney 2014-05-14 19:11 ` Paul Gortmaker 2014-05-14 19:27 ` Paul E. McKenney 2014-05-15 2:49 ` Mike Galbraith 2014-05-15 14:09 ` Paul Gortmaker 2014-11-27 9:17 ` Wanpeng Li 2014-11-27 15:31 ` Mike Galbraith 2014-11-27 11:36 ` Wanpeng Li 2014-05-15 3:18 ` Mike Galbraith 2014-05-15 14:45 ` Paul E. McKenney 2014-05-15 17:27 ` Mike Galbraith 2014-05-18 4:22 ` Mike Galbraith 2014-05-18 5:20 ` Paul E. McKenney 2014-05-18 8:36 ` Mike Galbraith 2014-05-18 15:58 ` Paul E. McKenney 2014-05-19 2:44 ` Mike Galbraith 2014-05-19 5:34 ` Paul E. McKenney 2014-05-20 14:53 ` Frederic Weisbecker 2014-05-20 15:53 ` Paul E. McKenney 2014-05-20 16:24 ` Frederic Weisbecker 2014-05-20 16:36 ` Peter Zijlstra 2014-05-20 17:20 ` Paul E. McKenney 2014-05-21 4:29 ` Mike Galbraith 2014-05-21 4:18 ` Mike Galbraith 2014-05-21 12:03 ` Paul E. McKenney 2014-05-21 3:52 ` Mike Galbraith 2014-05-19 10:54 ` Peter Zijlstra 2014-05-19 12:40 ` Peter Zijlstra 2014-05-22 19:40 ` Paul Gortmaker 2014-11-27 11:21 ` Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).