linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick?
@ 2011-12-24  9:06 Mike Galbraith
  2011-12-25  7:31 ` Mike Galbraith
  2011-12-27  6:40 ` Mike Galbraith
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-24  9:06 UTC (permalink / raw)
  To: RT; +Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

Greetings,

I'm trying to convince 3.0-rt to perform on a 64 core box, and having a
devil of a time with the darn thing.  I have a wild theory that cores
are much more closely synchronized in newer kernels, and that's causing
massive QPI jabbering and xtime lock contention as cores bang
cpupri_set() and ktime_get() in lockstep.

The 33-rt kernel in the numbers below has Steven's cpupri fix, and there
it works a treat.  In 3.0-rt, it does NOT save the day, and the only
reason I can imagine for observed behavior is that cores are ticking in
lockstep.

Anyway, tick perturbations are definitely much larger in 3.0-rt than in
33-rt, munching ~1.4% of every core vs ~.19% for 33-rt.

Has anything been done between 33 and 3.0 that would account for this?

Numbers and such below.

	-Mike

Test environment: nohz=off, cores 4-63 isolated via cpusets.  Start a
perturbation measurement proggy (tight self-calibrating rdtsc loop) as
the only thing running on isolated core 63.

(ponders telling customer that 10 x 8 core synchronized boxen has more
blinky lights, makes much sexier product than boring 1 x 80 core DL980:)


2.6.33.20-rt31
vogelweide:/abuild/mike/:[130]# sh -c 'echo $$ > /cpusets/rtcpus/tasks;taskset -c 63 pert 5'
2260.86 MHZ CPU
perturbation threshold 0.024 usecs.
pert/s:     1000 >14.27us:        1 min:  1.86 max: 16.22 avg:  1.90 sum/s:  1903us overhead: 0.19%
pert/s:     1000 >13.72us:        2 min:  1.86 max: 15.79 avg:  1.91 sum/s:  1909us overhead: 0.19%
pert/s:     1000 >13.23us:        1 min:  1.85 max: 15.59 avg:  1.91 sum/s:  1914us overhead: 0.19%


3.0.14-rt31 virgin
vogelweide:/abuild/mike/:[130]# sh -c 'echo $$ > /cpusets/rtcpus/tasks;taskset -c 63 pert 5'
2261.09 MHZ CPU
perturbation threshold 0.024 usecs.
pert/s:     1001 >57.09us:       52 min:  1.10 max: 83.94 avg: 14.38 sum/s: 14399us overhead: 1.44%
pert/s:     1001 >55.94us:       45 min:  1.10 max: 77.78 avg: 13.43 sum/s: 13455us overhead: 1.35%
pert/s:     1001 >54.87us:       65 min:  1.10 max: 75.77 avg: 14.57 sum/s: 14589us overhead: 1.46%


3.0.14-rt31 non-virgin, where I'm squabbling with this darn thing 
vogelweide:/abuild/mike/:[130]# sh -c 'echo $$ > /cpusets/rtcpus/tasks;taskset -c 63 pert 5'
2260.90 MHZ CPU
perturbation threshold 0.024 usecs.
pert/s:     1001 >15.15us:      613 min:  1.10 max: 62.47 avg:  6.88 sum/s:  6895us overhead: 0.69%
pert/s:     1001 >16.55us:      719 min:  1.10 max: 50.05 avg:  8.38 sum/s:  8394us overhead: 0.84%
pert/s:     1001 >17.77us:      795 min:  1.13 max: 48.51 avg:  8.98 sum/s:  8997us overhead: 0.90%
pert/s:     1001 >19.22us:      640 min:  1.10 max: 56.00 avg:  8.51 sum/s:  8524us overhead: 0.85%
pert/s:     1001 >20.36us:      560 min:  1.10 max: 52.73 avg:  8.41 sum/s:  8428us overhead: 0.84%
pert/s:     1001 >21.38us:      561 min:  1.11 max: 52.65 avg:  8.60 sum/s:  8611us overhead: 0.86%
pert/s:     1001 >22.21us:      583 min:  1.14 max: 50.35 avg:  8.90 sum/s:  8913us overhead: 0.89%
pert/s:     1001 >22.75us:      473 min:  1.12 max: 46.76 avg:  8.50 sum/s:  8516us overhead: 0.85%
pert/s:     1001 >23.42us:      383 min:  1.11 max: 51.04 avg:  7.86 sum/s:  7873us overhead: 0.79%
pert/s:     1001 >23.89us:      421 min:  1.11 max: 47.42 avg:  8.81 sum/s:  8825us overhead: 0.88%
(bend/spindle/mutilate below: echo RT_ISOLATE > sched_features)
pert/s:     1001 >18.74us:        2 min:  1.07 max: 22.62 avg:  2.57 sum/s:  2570us overhead: 0.26%
pert/s:     1001 >18.16us:        1 min:  1.13 max: 23.28 avg:  2.56 sum/s:  2566us overhead: 0.26%
pert/s:     1001 >17.64us:        1 min:  1.09 max: 23.30 avg:  2.61 sum/s:  2610us overhead: 0.26%
pert/s:     1001 >17.22us:        2 min:  1.09 max: 24.44 avg:  2.59 sum/s:  2593us overhead: 0.26%
pert/s:     1001 >16.21us:        0 min:  1.06 max: 11.46 avg:  2.62 sum/s:  2620us overhead: 0.26%
pert/s:     1001 >15.33us:        0 min:  1.14 max: 12.40 avg:  2.59 sum/s:  2597us overhead: 0.26%
pert/s:     1001 >14.83us:        1 min:  1.10 max: 17.94 avg:  2.59 sum/s:  2599us overhead: 0.26%
pert/s:     1001 >14.03us:        0 min:  1.07 max: 11.20 avg:  2.60 sum/s:  2605us overhead: 0.26%
pert/s:     1001 >13.84us:        1 min:  1.12 max: 21.51 avg:  2.62 sum/s:  2629us overhead: 0.26%
pert/s:     1001 >13.63us:        4 min:  1.12 max: 20.90 avg:  2.60 sum/s:  2604us overhead: 0.26%


profile CPU 63
 NO_RT_ISOLATE                                                RT_ISOLATE                                             (no hacks)
 3.0.14-rt31                                                  3.0.14-rt31                                            2.6.33-rt31
 47.83%  [kernel]  [k] cpupri_set                             8.67%  [kernel]  [k] tick_sched_timer                  8.28%  [kernel]  [k] cpupri_set
 18.38%  [kernel]  [k] native_write_msr_safe                  7.03%  [kernel]  [k] __schedule                        7.52%  [kernel]  [k] __schedule
  6.83%  [kernel]  [k] cpuacct_charge                         6.42%  [kernel]  [k] native_write_msr_safe             6.30%  [kernel]  [k] apic_timer_interrupt
  2.19%  [kernel]  [k] rcu_enter_nohz                         6.02%  [kernel]  [k] apic_timer_interrupt              5.66%  [kernel]  [k] native_write_msr_safe
  2.12%  [kernel]  [k] __schedule                             3.39%  [kernel]  [k] __switch_to                       3.13%  [kernel]  [k] scheduler_tick
  1.95%  [kernel]  [k] apic_timer_interrupt                   2.73%  [kernel]  [k] ktime_get                         2.69%  [kernel]  [k] _raw_spin_lock
  1.91%  [kernel]  [k] tick_sched_timer                       2.21%  [kernel]  [k] rcu_preempt_note_context_switch   2.61%  [kernel]  [k] __switch_to
  1.56%  [kernel]  [k] ktime_get                              1.97%  [kernel]  [k] rcu_check_callbacks               2.38%  [kernel]  [k] try_to_wake_up
  1.20%  [kernel]  [k] run_timer_softirq                      1.85%  [kernel]  [k] run_posix_cpu_timers              2.16%  [kernel]  [k] native_read_msr_safe
  0.72%  [kernel]  [k] __switch_to                            1.63%  [kernel]  [k] run_timer_softirq                 1.99%  [kernel]  [k] native_read_tsc
  0.61%  [kernel]  [k] rcu_preempt_note_context_switch        1.63%  [kernel]  [k] common_interrupt                  1.98%  [kernel]  [k] update_curr_rt
  0.55%  [kernel]  [k] scheduler_tick                         1.63%  [kernel]  [k] _raw_spin_unlock_irq              1.94%  [kernel]  [k] perf_event_task_sched_in
  0.54%  [kernel]  [k] __thread_do_softirq                    1.60%  [kernel]  [k] __thread_do_softirq               1.89%  [kernel]  [k] ktime_get
  0.51%  [kernel]  [k] __rcu_pending                          1.58%  [kernel]  [k] _raw_spin_lock                    1.87%  [kernel]  [k] cpuacct_charge
  0.51%  [kernel]  [k] _raw_spin_lock                         1.46%  [kernel]  [k] __rcu_pending                     1.80%  [kernel]  [k] run_ksoftirqd
  0.48%  [kernel]  [k] native_read_tsc                        1.36%  [kernel]  [k] wakeup_softirqd                   1.73%  [kernel]  [k] _raw_spin_unlock
  0.45%  [kernel]  [k] hrtimer_interrupt                      1.35%  [kernel]  [k] finish_task_switch                1.71%  [kernel]  [k] perf_adjust_period
  0.44%  [kernel]  [k] raise_softirq                          1.31%  [kernel]  [k] cpuacct_charge                    1.46%  [kernel]  [k] __dequeue_entity
  0.33%  [kernel]  [k] __enqueue_rt_entity                    1.28%  [kernel]  [k] handle_pending_softirqs           1.33%  [kernel]  [k] rb_insert_color
  0.31%  [kernel]  [k] rt_spin_unlock                         1.28%  [kernel]  [k] scheduler_tick                    1.28%  [kernel]  [k] __rcu_pending


profile all 64 CPUs
 (RT_ISOLATE hack turned back off)
 3.0.14-rt31                                                  2.6.33.20-rt31
 61.08%  [kernel]      [k] cpupri_set                         27.50%  [kernel]      [k] apic_timer_interrupt
 15.57%  [kernel]      [k] ktime_get                           7.52%  [kernel]      [k] cpupri_set
  5.79%  [kernel]      [k] apic_timer_interrupt                5.35%  [kernel]      [k] __schedule
  4.31%  [kernel]      [k] rcu_enter_nohz                      4.75%  [kernel]      [k] _raw_spin_lock
  2.84%  [kernel]      [k] cpuacct_charge                      3.88%  [kernel]      [k] scheduler_tick
  1.17%  [kernel]      [k] __schedule                          2.81%  [kernel]      [k] ktime_get
  0.92%  [kernel]      [k] tick_sched_timer                    2.59%  [kernel]      [k] tick_check_oneshot_broadcast
  0.65%  [kernel]      [k] native_write_msr_safe               2.50%  [kernel]      [k] native_write_msr_safe
  0.53%  [kernel]      [k] scheduler_tick                      2.28%  [kernel]      [k] native_read_tsc
  0.41%  [kernel]      [k] tick_check_oneshot_broadcast        2.22%  [kernel]      [k] native_read_msr_safe
  0.35%  [kernel]      [k] native_load_tls                     1.11%  [kernel]      [k] __switch_to
  0.34%  [kernel]      [k] update_cpu_load                     1.05%  [kernel]      [k] read_tsc
  0.27%  [kernel]      [k] __rcu_pending                       1.03%  [kernel]      [k] rb_erase
  0.23%  [kernel]      [k] _raw_spin_lock                      1.00%  [kernel]      [k] rcu_sched_qs
  0.23%  [kernel]      [k] __thread_do_softirq                 0.94%  [kernel]      [k] resched_task
  0.21%  [kernel]      [k] run_timer_softirq                   0.93%  [kernel]      [k] run_ksoftirqd
  0.19%  [kernel]      [k] read_tsc                            0.92%  [kernel]      [k] atomic_notifier_call_chain
  0.19%  [kernel]      [k] _raw_spin_lock_irqsave              0.91%  [kernel]      [k] _raw_spin_unlock
  0.19%  [kernel]      [k] native_read_tsc                     0.87%  [kernel]      [k] __rcu_read_unlock
  0.17%  [kernel]      [k] rcu_preempt_note_context_switch     0.87%  [kernel]      [k] native_sched_clock
  0.16%  [kernel]      [k] __switch_to                         0.87%  [kernel]      [k] x86_pmu_read
  0.14%  [kernel]      [k] rt_spin_lock                        0.85%  [kernel]      [k] perf_adjust_period
  0.13%  [kernel]      [k] profile_tick                        0.83%  [kernel]      [k] try_to_wake_up
  0.13%  [kernel]      [k] rt_spin_unlock                      0.81%  [kernel]      [k] tick_sched_timer
  0.13%  [kernel]      [k] finish_task_switch                  0.80%  [kernel]      [k] __perf_pending_run
  0.11%  [kernel]      [k] run_ksoftirqd                       0.77%  [kernel]      [k] sched_clock_cpu
  0.11%  [kernel]      [k] handle_pending_softirqs             0.70%  [kernel]      [k] finish_task_switch
  0.10%  [kernel]      [k] smp_apic_timer_interrupt            0.68%  [kernel]      [k] __atomic_notifier_call_chain
  0.09%  [kernel]      [k] tick_nohz_stop_sched_tick           0.67%  [kernel]      [k] hrtimer_interrupt
  0.09%  [kernel]      [k] pick_next_task_rt                   0.67%  [kernel]      [k] __remove_hrtimer
  0.09%  [kernel]      [k] _raw_spin_lock_irq                  0.66%  [kernel]      [k] save_args
  0.09%  [kernel]      [k] timerqueue_del                      0.64%  [kernel]      [k] rt_spin_lock
  0.08%  [kernel]      [k] hrtimer_interrupt                   0.61%  [kernel]      [k] _raw_spin_lock_irq
  0.07%  [kernel]      [k] pick_next_task_stop                 0.58%  [kernel]      [k] idle_cpu
  0.07%  [kernel]      [k] migrate_enable                      0.56%  [kernel]      [k] __rcu_pending
  0.07%  [kernel]      [k] wakeup_softirqd                     0.56%  [kernel]      [k] account_process_tick
  0.07%  [kernel]      [k] native_sched_clock                  0.55%  [kernel]      [k] tick_nohz_stop_sched_tick
  0.06%  [kernel]      [k] __dequeue_rt_entity                 0.51%  [kernel]      [k] rb_next
  0.06%  [kernel]      [k] update_curr_rt                      0.46%  [kernel]      [k] rt_spin_unlock
  0.06%  [kernel]      [k] _raw_spin_unlock_irq                0.45%  [kernel]      [k] rcu_irq_enter

RT_ISOLATE cpupri_set() insolation hacklet

---
 kernel/sched_features.h |    5 +++++
 kernel/sched_rt.c       |   17 +++++++++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -79,3 +79,8 @@ SCHED_FEAT(TTWU_QUEUE, 0)
 
 SCHED_FEAT(FORCE_SD_OVERLAP, 0)
 SCHED_FEAT(RT_RUNTIME_SHARE, 1)
+
+/*
+ * Protect isolated CPUs from cpupri latency
+ */
+SCHED_FEAT(RT_ISOLATE, 1)
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -876,6 +876,11 @@ void dec_rt_group(struct sched_rt_entity
 
 #endif /* CONFIG_RT_GROUP_SCHED */
 
+static inline int rq_isolate(struct rq *rq)
+{
+	return sched_feat(RT_ISOLATE) && !rq->sd;
+}
+
 static inline
 void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
@@ -884,7 +889,8 @@ void inc_rt_tasks(struct sched_rt_entity
 	WARN_ON(!rt_prio(prio));
 	rt_rq->rt_nr_running++;
 
-	inc_rt_prio(rt_rq, prio);
+	if (!rq_isolate(rq_of_rt_rq(rt_rq)))
+		inc_rt_prio(rt_rq, prio);
 	inc_rt_migration(rt_se, rt_rq);
 	inc_rt_group(rt_se, rt_rq);
 }
@@ -896,7 +902,8 @@ void dec_rt_tasks(struct sched_rt_entity
 	WARN_ON(!rt_rq->rt_nr_running);
 	rt_rq->rt_nr_running--;
 
-	dec_rt_prio(rt_rq, rt_se_prio(rt_se));
+	if (!rq_isolate(rq_of_rt_rq(rt_rq)))
+		dec_rt_prio(rt_rq, rt_se_prio(rt_se));
 	dec_rt_migration(rt_se, rt_rq);
 	dec_rt_group(rt_se, rt_rq);
 }
@@ -1110,6 +1117,9 @@ static void check_preempt_equal_prio(str
 	if (rq->curr->rt.nr_cpus_allowed == 1)
 		return;
 
+	if (rq_isolate(rq))
+		return;
+
 	if (p->rt.nr_cpus_allowed != 1
 	    && cpupri_find(&rq->rd->cpupri, p, NULL))
 		return;
@@ -1300,6 +1310,9 @@ static int find_lowest_rq(struct task_st
 	if (task->rt.nr_cpus_allowed == 1)
 		return -1; /* No other targets possible */
 
+	if (rq_isolate(cpu_rq(this_cpu)))
+		return -1;
+
 	if (!cpupri_find(&task_rq(task)->rd->cpupri, task, lowest_mask))
 		return -1; /* No targets found */
 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick?
  2011-12-24  9:06 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick? Mike Galbraith
@ 2011-12-25  7:31 ` Mike Galbraith
  2011-12-26  8:04   ` Mike Galbraith
  2011-12-27  6:40 ` Mike Galbraith
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2011-12-25  7:31 UTC (permalink / raw)
  To: RT; +Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Sat, 2011-12-24 at 10:06 +0100, Mike Galbraith wrote:
> Greetings,
> 
> I'm trying to convince 3.0-rt to perform on a 64 core box, and having a
> devil of a time with the darn thing.  I have a wild theory that cores
> are much more closely synchronized in newer kernels, and that's causing
> massive QPI jabbering and xtime lock contention as cores bang
> cpupri_set() and ktime_get() in lockstep.

Seems not so wild a theory.

          <idle>-0     [055]  1285.013088: mwait_idle <-cpu_idle
          <idle>-0     [053]  1285.013860: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [043]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [053]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [044]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [043]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [061]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [054]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [038]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [053]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [044]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [043]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [008]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [032]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [051]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [024]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [054]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [038]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [053]  1285.013861: rcu_irq_enter <-irq_enter
          <idle>-0     [044]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [045]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [006]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [043]  1285.013861: rcu_irq_enter <-irq_enter
          <idle>-0     [029]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [014]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [032]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [042]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [031]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [051]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [024]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [054]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [015]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [027]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [038]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [044]  1285.013861: rcu_irq_enter <-irq_enter
          <idle>-0     [053]  1285.013861: rcu_exit_nohz <-rcu_irq_enter
          <idle>-0     [035]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [045]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [022]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [028]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [050]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [043]  1285.013861: rcu_exit_nohz <-rcu_irq_enter
          <idle>-0     [049]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [061]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [019]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [032]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [029]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [014]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [024]  1285.013861: irq_enter <-smp_apic_timer_interrupt
          <idle>-0     [042]  1285.013861: native_apic_mem_write <-smp_apic_timer_interrupt
          <idle>-0     [039]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <idle>-0     [026]  1285.013861: smp_apic_timer_interrupt <-apic_timer_interrupt
          <....snipage>

Guess I need to fight fire with fire.  Make ticks jitter a little
somehow, so they don't make itimer wakeup jitter a truckload when it
collides with tick that is busy colliding with zillion other ticks.

'course that helps the real problem (dram sucks) not one bit.

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick?
  2011-12-25  7:31 ` Mike Galbraith
@ 2011-12-26  8:04   ` Mike Galbraith
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-26  8:04 UTC (permalink / raw)
  To: RT; +Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Sun, 2011-12-25 at 08:31 +0100, Mike Galbraith wrote:

> Guess I need to fight fire with fire.  Make ticks jitter a little
> somehow, so they don't make itimer wakeup jitter a truckload when it
> collides with tick that is busy colliding with zillion other ticks.

Yup.  Perfect is the enemy of good.

non-virgin:
vogelweide:/abuild/mike/:[1]# sh -c 'echo $$ > /cpusets/rtcpus/tasks;./jitter -c 63 -f 960 -p 99 -t 10 -d 300'
CPU63 priority: 99 timer freq: 960 Hz (1041666 ns) tolerance: 10 usecs, stats interval: 300 secs

jitter:   8.87  min:      3.08 max:     11.95 mean:      4.92 stddev:    0.56
4 > 10 us hits  min:     11.01 max:     11.95 mean:     11.35 stddev:    0.37

jitter:   8.68  min:      3.09 max:     11.77 mean:      4.91 stddev:    0.56
2 > 10 us hits  min:     11.10 max:     11.77 mean:     11.44 stddev:    0.33

jitter:   7.90  min:      3.12 max:     11.02 mean:      4.91 stddev:    0.56
1 > 10 us hits  min:     11.02 max:     11.02 mean:     11.02 stddev:    0.00

virgin:
vogelweide:/abuild/mike/:[1]# sh -c 'echo $$ > /cpusets/rtcpus/tasks;./jitter -c 63 -f 960 -p 99 -t 10 -d 300'
CPU63 priority: 99 timer freq: 960 Hz (1041666 ns) tolerance: 10 usecs, stats interval: 300 secs

jitter:  68.30  min:      2.43 max:     70.72 mean:      6.22 stddev:    6.41
16668 > 10 us hits      min:     11.00 max:     70.72 mean:     28.57 stddev:   13.08

jitter:  71.76  min:      2.56 max:     74.32 mean:      6.29 stddev:    6.61
17257 > 10 us hits      min:     11.00 max:     74.32 mean:     28.95 stddev:   13.24

jitter:  70.51  min:      2.50 max:     73.01 mean:      6.17 stddev:    6.26
16368 > 10 us hits      min:     11.00 max:     73.01 mean:     28.29 stddev:   12.76

I'm still colliding a bit, and overhead is still too high, but poking
tick in the eye with a sharp stick made it crawl under it's rock, so
methinks the tail has been pinned on the right donkey.

	-Mike

64 core DL980 idling, nohz=1, cores 4-63 isolated

non-virgin:
top - 08:26:35 up  2:04,  2 users,  load average: 0.00, 0.01, 0.41
Tasks: 1051 total,   2 running, 1049 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7911876k total,  1004812k used,  6907064k free,    12836k buffers
Swap:  1959924k total,        0k used,  1959924k free,   802324k cached

   PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+   P COMMAND                                                                                             
 22210 root      20   0  9680 2032  928 R      1  0.0   0:00.50  0 top                                                                                                 
     4 root     -41   0     0    0    0 S      0  0.0   0:07.85  0 sirq-timer/0                                                                                        
    22 root     -41   0     0    0    0 S      0  0.0   0:13.24  1 sirq-timer/1                                                                                        
    37 root     -41   0     0    0    0 S      0  0.0   0:13.32  2 sirq-timer/2                                                                                        
    51 root     -41   0     0    0    0 S      0  0.0   0:12.29  3 sirq-timer/3                                                                                        
    65 root     -41   0     0    0    0 S      0  0.0   0:12.07  4 sirq-timer/4                                                                                        
    79 root     -41   0     0    0    0 S      0  0.0   0:12.20  5 sirq-timer/5                                                                                        
    93 root     -41   0     0    0    0 S      0  0.0   0:12.07  6 sirq-timer/6                                                                                        
   121 root     -41   0     0    0    0 S      0  0.0   0:12.32  8 sirq-timer/8                                                                                        
   163 root     -41   0     0    0    0 S      0  0.0   0:12.22 11 sirq-timer/11                                                                                       
   177 root     -41   0     0    0    0 S      0  0.0   0:12.22 12 sirq-timer/12                                                                                       
   191 root     -41   0     0    0    0 S      0  0.0   0:12.25 13 sirq-timer/13                                                                                       
   205 root     -41   0     0    0    0 S      0  0.0   0:12.21 14 sirq-timer/14                                                                                       
   219 root     -41   0     0    0    0 S      0  0.0   0:12.21 15 sirq-timer/15                                                                                       
   233 root     -41   0     0    0    0 S      0  0.0   0:13.54 16 sirq-timer/16

virgin:
top - 08:57:39 up 23 min,  1 user,  load average: 0.00, 0.02, 0.10
Tasks: 468 total,   2 running, 466 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7914276k total,   471268k used,  7443008k free,    84040k buffers
Swap:  1959924k total,        0k used,  1959924k free,   243072k cached

   PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+   P COMMAND      
   179 root      RT   0     0    0    0 S      3  0.0   0:45.89 32 ksoftirqd/32 
   231 root      RT   0     0    0    0 S      3  0.0   0:46.48 40 ksoftirqd/40 
   241 root      RT   0     0    0    0 S      3  0.0   0:46.53 42 ksoftirqd/42 
   246 root      RT   0     0    0    0 S      3  0.0   0:46.27 43 ksoftirqd/43 
   184 root      RT   0     0    0    0 S      3  0.0   0:44.16 33 ksoftirqd/33 
   206 root      RT   0     0    0    0 S      3  0.0   0:44.48 35 ksoftirqd/35 
   211 root      RT   0     0    0    0 R      3  0.0   0:45.19 36 ksoftirqd/36 
   216 root      RT   0     0    0    0 S      3  0.0   0:44.83 37 ksoftirqd/37 
   221 root      RT   0     0    0    0 S      3  0.0   0:43.73 38 ksoftirqd/38 
   226 root      RT   0     0    0    0 S      3  0.0   0:44.73 39 ksoftirqd/39 
   236 root      RT   0     0    0    0 S      3  0.0   0:45.86 41 ksoftirqd/41 
   251 root      RT   0     0    0    0 S      3  0.0   0:43.64 44 ksoftirqd/44 
   323 root      RT   0     0    0    0 S      3  0.0   0:40.69 56 ksoftirqd/56 
   345 root      RT   0     0    0    0 S      3  0.0   0:40.83 58 ksoftirqd/58 
   201 root      RT   0     0    0    0 S      3  0.0   0:44.86 34 ksoftirqd/34 
   256 root      RT   0     0    0    0 S      3  0.0   0:41.62 45 ksoftirqd/45 
   273 root      RT   0     0    0    0 S      3  0.0   0:41.09 46 ksoftirqd/46


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick?
  2011-12-24  9:06 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick? Mike Galbraith
  2011-12-25  7:31 ` Mike Galbraith
@ 2011-12-27  6:40 ` Mike Galbraith
  2011-12-27  9:20   ` [patch] clockevents: Reinstate the per cpu tick skew Mike Galbraith
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2011-12-27  6:40 UTC (permalink / raw)
  To: RT; +Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Sat, 2011-12-24 at 10:06 +0100, Mike Galbraith wrote:

> Has anything been done between 33 and 3.0 that would account for this?

Um, like af5ab277d for instance.

Arjan is right that this contention trouble doesn't happen with nohz..
but low jitter doesn't happen with nohz either.

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-27  6:40 ` Mike Galbraith
@ 2011-12-27  9:20   ` Mike Galbraith
  2011-12-28  5:17     ` Mike Galbraith
  2011-12-28 13:32     ` Arjan van de Ven
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-27  9:20 UTC (permalink / raw)
  To: RT
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Arjan van de Ven


Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
Historically, Linux has tried to make the regular timer tick on the
various CPUs not happen at the same time, to avoid contention on
xtime_lock.
    
Nowadays, with the tickless kernel, this contention no longer happens
since time keeping and updating are done differently. In addition,
this skew is actually hurting power consumption in a measurable way on
many-core systems.
End quote

Contention remains a problem if NO_HZ is either not configured, or is
nohz=off disabled due to workload constraints.  The RT kernel running
nohz=off was measured to be using > 1.4% CPU just ticking 64 CPUs, with
tick perturbation reaching ~80us.  For loads where measured (>100us)
NO_HZ latencies are intolerable, a must have.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>

---
 kernel/time/tick-sched.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -689,6 +689,7 @@ static inline void tick_check_nohz(int c
 
 static inline void tick_nohz_switch_to_nohz(void) { }
 static inline void tick_check_nohz(int cpu) { }
+#define tick_nohz_enabled 0
 
 #endif /* NO_HZ */
 
@@ -777,6 +778,14 @@ void tick_setup_sched_timer(void)
 	/* Get the next period (per cpu) */
 	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
 
+	/* Offset the tick when NO_HZ is configured out or boot disabled */
+	if (!tick_nohz_enabled) {
+		u64 offset = ktime_to_ns(tick_period) >> 1;
+		do_div(offset, num_possible_cpus());
+		offset *= smp_processor_id();
+		hrtimer_add_expires_ns(&ts->sched_timer, offset);
+	}
+
 	for (;;) {
 		hrtimer_forward(&ts->sched_timer, now, tick_period);
 		hrtimer_start_expires(&ts->sched_timer,



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-27  9:20   ` [patch] clockevents: Reinstate the per cpu tick skew Mike Galbraith
@ 2011-12-28  5:17     ` Mike Galbraith
  2011-12-28  8:22       ` Mike Galbraith
  2011-12-28 13:35       ` Arjan van de Ven
  2011-12-28 13:32     ` Arjan van de Ven
  1 sibling, 2 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28  5:17 UTC (permalink / raw)
  To: RT
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Arjan van de Ven

On Tue, 2011-12-27 at 10:20 +0100, Mike Galbraith wrote:
> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
> Historically, Linux has tried to make the regular timer tick on the
> various CPUs not happen at the same time, to avoid contention on
> xtime_lock.
>     
> Nowadays, with the tickless kernel, this contention no longer happens
> since time keeping and updating are done differently. In addition,
> this skew is actually hurting power consumption in a measurable way on
> many-core systems.
> End quote

Hm, nohz enabled, hogs burning up 60 of 64 cores.

 56.11%  [kernel]      [k] ktime_get
  5.54%  [kernel]      [k] scheduler_tick
  4.02%  [kernel]      [k] cpuacct_charge
  3.78%  [kernel]      [k] __rcu_pending
  3.76%  [kernel]      [k] tick_sched_timer
  3.42%  [kernel]      [k] native_write_msr_safe
  1.58%  [kernel]      [k] run_timer_softirq
  1.28%  [kernel]      [k] __schedule
  1.21%  [kernel]      [k] apic_timer_interrupt
  1.07%  [kernel]      [k] _raw_spin_lock
  0.81%  [kernel]      [k] __switch_to
  0.67%  [kernel]      [k] thread_return

Maybe skew-me wants to become a boot option?

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28  5:17     ` Mike Galbraith
@ 2011-12-28  8:22       ` Mike Galbraith
  2011-12-28  9:59         ` Mike Galbraith
  2011-12-28 13:35       ` Arjan van de Ven
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28  8:22 UTC (permalink / raw)
  To: RT
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Arjan van de Ven

On Wed, 2011-12-28 at 06:17 +0100, Mike Galbraith wrote:
> On Tue, 2011-12-27 at 10:20 +0100, Mike Galbraith wrote:
> > Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
> > Historically, Linux has tried to make the regular timer tick on the
> > various CPUs not happen at the same time, to avoid contention on
> > xtime_lock.
> >     
> > Nowadays, with the tickless kernel, this contention no longer happens
> > since time keeping and updating are done differently. In addition,
> > this skew is actually hurting power consumption in a measurable way on
> > many-core systems.
> > End quote
> 
> Hm, nohz enabled, hogs burning up 60 of 64 cores.
> 
>  56.11%  [kernel]      [k] ktime_get
>   5.54%  [kernel]      [k] scheduler_tick
>   4.02%  [kernel]      [k] cpuacct_charge
>   3.78%  [kernel]      [k] __rcu_pending
>   3.76%  [kernel]      [k] tick_sched_timer
>   3.42%  [kernel]      [k] native_write_msr_safe
>   1.58%  [kernel]      [k] run_timer_softirq
>   1.28%  [kernel]      [k] __schedule
>   1.21%  [kernel]      [k] apic_timer_interrupt
>   1.07%  [kernel]      [k] _raw_spin_lock
>   0.81%  [kernel]      [k] __switch_to
>   0.67%  [kernel]      [k] thread_return
> 
> Maybe skew-me wants to become a boot option?

Yup.. or something.  As above, but with skew.

  3.06%  [kernel]      [k] ktime_get

(Hm, wonder if nohz is usable now... nope.  Tell nohz that isolated
cores don't play balancer again, maybe it'll work now)

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28  8:22       ` Mike Galbraith
@ 2011-12-28  9:59         ` Mike Galbraith
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28  9:59 UTC (permalink / raw)
  To: RT
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Arjan van de Ven

On Wed, 2011-12-28 at 09:22 +0100, Mike Galbraith wrote:

> (Hm, wonder if nohz is usable now... nope.  Tell nohz that isolated
> cores don't play balancer again, maybe it'll work now)

Yup, worked.  60 core jitter test is approaching single digit.  Woohoo.

---
 kernel/sched_fair.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -4517,6 +4517,10 @@ void select_nohz_load_balancer(int stop_
 {
 	int cpu = smp_processor_id();
 
+	/* Isolated cores do not play */
+	if (!cpu_rq(cpu)->sd)
+		return;
+
 	if (stop_tick) {
 		if (!cpu_active(cpu)) {
 			if (atomic_read(&nohz.load_balancer) != cpu)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-27  9:20   ` [patch] clockevents: Reinstate the per cpu tick skew Mike Galbraith
  2011-12-28  5:17     ` Mike Galbraith
@ 2011-12-28 13:32     ` Arjan van de Ven
  2011-12-28 15:10       ` Mike Galbraith
  1 sibling, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2011-12-28 13:32 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/27/2011 10:20 AM, Mike Galbraith wrote:
> 
> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 
> Historically, Linux has tried to make the regular timer tick on
> the various CPUs not happen at the same time, to avoid contention
> on xtime_lock.
> 
> Nowadays, with the tickless kernel, this contention no longer
> happens since time keeping and updating are done differently. In
> addition, this skew is actually hurting power consumption in a
> measurable way on many-core systems. End quote
> 
> Contention remains a problem if NO_HZ is either not configured, or
> is nohz=off disabled due to workload constraints.  The RT kernel
> running nohz=off was measured to be using > 1.4% CPU just ticking
> 64 CPUs, with tick perturbation reaching ~80us.  For loads where
> measured (>100us) NO_HZ latencies are intolerable, a must have.

I think we need to just say no to this, and kill the nohz=off option
entirely.

Seriously, are people still running with ticks for any legitimate
reasons? (and not just because they goofed their config file)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJO+xpVAAoJEEHdSxh4DVnEMvUH/A5qOO+igivDtdEjw5b39w3h
4fG7VLiyn34+AAFmBqfgx9dKbl4DkYzBBRYXcNVnicnjqnH7ZZ+FOvFo2zOCUiGG
xeDNox4hcl1jJ/6J1o6p1ecJXOUlbwNsXF9SVG38HPpJ4D0mgllAdy1wHJfv3+LA
Ad98sUDmhq2gpcjyupvv7exIor1i3JFo/Q+CFbDTVQrgz99zo/D2IX3ps4wRfhHq
q0rKcU4ZZJVHeHkItHOyEgeex9RPGlxNRSUu50zIHKugVlH9wbTtIzBkPzt1Nn0S
yThyQGd/xcFfQDaiwymWLf78d6wpEZ/BW+QIlPlO2xNMD/Qz980w86yyh0x2FQk=
=yDHd
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28  5:17     ` Mike Galbraith
  2011-12-28  8:22       ` Mike Galbraith
@ 2011-12-28 13:35       ` Arjan van de Ven
  2011-12-28 14:59         ` Mike Galbraith
  2011-12-29  7:22         ` Mike Galbraith
  1 sibling, 2 replies; 17+ messages in thread
From: Arjan van de Ven @ 2011-12-28 13:35 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/28/2011 6:17 AM, Mike Galbraith wrote:
> On Tue, 2011-12-27 at 10:20 +0100, Mike Galbraith wrote:
>> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 
>> Historically, Linux has tried to make the regular timer tick on
>> the various CPUs not happen at the same time, to avoid contention
>> on xtime_lock.
>> 
>> Nowadays, with the tickless kernel, this contention no longer
>> happens since time keeping and updating are done differently. In
>> addition, this skew is actually hurting power consumption in a
>> measurable way on many-core systems. End quote
> 
> Hm, nohz enabled, hogs burning up 60 of 64 cores.
> 
> 56.11%  [kernel]      [k] ktime_get 5.54%  [kernel]      [k]
> scheduler_tick 4.02%  [kernel]      [k] cpuacct_charge 3.78%
> [kernel]      [k] __rcu_pending 3.76%  [kernel]      [k]
> tick_sched_timer 3.42%  [kernel]      [k] native_write_msr_safe 
> 1.58%  [kernel]      [k] run_timer_softirq 1.28%  [kernel]      [k]
> __schedule 1.21%  [kernel]      [k] apic_timer_interrupt 1.07%
> [kernel]      [k] _raw_spin_lock 0.81%  [kernel]      [k]
> __switch_to 0.67%  [kernel]      [k] thread_return
> 
> Maybe skew-me wants to become a boot option?

this is 56% of kernel time.. of how much total time?

(and are you using a system where tsc/lapic can be used, or are you
using one of those boatanchors that need hpet?)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJO+xsmAAoJEEHdSxh4DVnEcDoIAK8Q0rTCNb/xX3mNm7QpLpIU
kLXEHvv7Xk58TxKfOC7EDmD4EMdJxbebL6ZR7ol/J7mkQjnUjsFdGF1qF1TAW1Ph
YdPV5liDMfwO+Aczj0ZdBBacuoIivUrFcwArKwonttwSB0dh1vyJU9VsRC7nTu4z
eILaiYOr2pTBSKReYiQxr9u+1/zfmlwsENbbq/Z/JnbQYdf1y0ZNZ1kDF4zOwuHQ
EdVu4o1RPRXwBlMI+6E3CaEyl6wACOGyoy3tsuHoR7Ax6YcwJUoDFmAVP8Bb+YE8
19AlRDnBwSCV+AaJ3qbaEdPdpX7Alp1h3fpdH8rZ/Ndu5DTeGDAuYI1HIDicZgA=
=pSVy
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 13:35       ` Arjan van de Ven
@ 2011-12-28 14:59         ` Mike Galbraith
  2011-12-28 16:57           ` Peter Zijlstra
  2011-12-29  7:22         ` Mike Galbraith
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28 14:59 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Wed, 2011-12-28 at 14:35 +0100, Arjan van de Ven wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/28/2011 6:17 AM, Mike Galbraith wrote:
> > On Tue, 2011-12-27 at 10:20 +0100, Mike Galbraith wrote:
> >> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 
> >> Historically, Linux has tried to make the regular timer tick on
> >> the various CPUs not happen at the same time, to avoid contention
> >> on xtime_lock.
> >> 
> >> Nowadays, with the tickless kernel, this contention no longer
> >> happens since time keeping and updating are done differently. In
> >> addition, this skew is actually hurting power consumption in a
> >> measurable way on many-core systems. End quote
> > 
> > Hm, nohz enabled, hogs burning up 60 of 64 cores.
> > 
> > 56.11%  [kernel]      [k] ktime_get 5.54%  [kernel]      [k]
> > scheduler_tick 4.02%  [kernel]      [k] cpuacct_charge 3.78%
> > [kernel]      [k] __rcu_pending 3.76%  [kernel]      [k]
> > tick_sched_timer 3.42%  [kernel]      [k] native_write_msr_safe 
> > 1.58%  [kernel]      [k] run_timer_softirq 1.28%  [kernel]      [k]
> > __schedule 1.21%  [kernel]      [k] apic_timer_interrupt 1.07%
> > [kernel]      [k] _raw_spin_lock 0.81%  [kernel]      [k]
> > __switch_to 0.67%  [kernel]      [k] thread_return
> > 
> > Maybe skew-me wants to become a boot option?
> 
> this is 56% of kernel time.. of how much total time?

I'd have to re-measure. I didn't have any reason to watch the total,
that it was a big perturbation source was all that mattered.

It's not that it's a huge percentage of total time by any means, just
that the jitter induced is too large for the kernel to be unusable for
the realtime load it's expected to support.  With 30 usecs to play with,
every one counts.

> (and are you using a system where tsc/lapic can be used, or are you
> using one of those boatanchors that need hpet?)

Box is an HP DL980, 64 x X7560.

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 13:32     ` Arjan van de Ven
@ 2011-12-28 15:10       ` Mike Galbraith
  2012-01-03  6:20         ` Mike Galbraith
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28 15:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Wed, 2011-12-28 at 14:32 +0100, Arjan van de Ven wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/27/2011 10:20 AM, Mike Galbraith wrote:
> > 
> > Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 
> > Historically, Linux has tried to make the regular timer tick on
> > the various CPUs not happen at the same time, to avoid contention
> > on xtime_lock.
> > 
> > Nowadays, with the tickless kernel, this contention no longer
> > happens since time keeping and updating are done differently. In
> > addition, this skew is actually hurting power consumption in a
> > measurable way on many-core systems. End quote
> > 
> > Contention remains a problem if NO_HZ is either not configured, or
> > is nohz=off disabled due to workload constraints.  The RT kernel
> > running nohz=off was measured to be using > 1.4% CPU just ticking
> > 64 CPUs, with tick perturbation reaching ~80us.  For loads where
> > measured (>100us) NO_HZ latencies are intolerable, a must have.
> 
> I think we need to just say no to this, and kill the nohz=off option
> entirely.
> 
> Seriously, are people still running with ticks for any legitimate
> reasons? (and not just because they goofed their config file)

Yup.  Realtime loads sometimes need it.  Even without contention
problems, entering/leaving nohz is a latency source.  If every little
bit counts, you may have the choice of letting the electric meter spin
or not getting the job done at all.

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 14:59         ` Mike Galbraith
@ 2011-12-28 16:57           ` Peter Zijlstra
  2011-12-28 17:28             ` Mike Galbraith
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2011-12-28 16:57 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Arjan van de Ven, RT, Thomas Gleixner, Steven Rostedt,
	Ingo Molnar

On Wed, 2011-12-28 at 15:59 +0100, Mike Galbraith wrote:
> 
> > (and are you using a system where tsc/lapic can be used, or are you
> > using one of those boatanchors that need hpet?)
> 
> Box is an HP DL980, 64 x X7560. 

That smells like NHM-EX, aka boatanchor.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 16:57           ` Peter Zijlstra
@ 2011-12-28 17:28             ` Mike Galbraith
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-28 17:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arjan van de Ven, RT, Thomas Gleixner, Steven Rostedt,
	Ingo Molnar

On Wed, 2011-12-28 at 17:57 +0100, Peter Zijlstra wrote:
> On Wed, 2011-12-28 at 15:59 +0100, Mike Galbraith wrote:
> > 
> > > (and are you using a system where tsc/lapic can be used, or are you
> > > using one of those boatanchors that need hpet?)
> > 
> > Box is an HP DL980, 64 x X7560. 
> 
> That smells like NHM-EX, aka boatanchor.

I have a 32 core test box that has a minimum itimer fires -> task runs
of 6.79 usecs.. now _that_ box would make a great boat anchor.  DL980
may be a work horse, but it ain't a broken down old nag that should be
sent off to the glue factory.. yet.

	-Mike



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 13:35       ` Arjan van de Ven
  2011-12-28 14:59         ` Mike Galbraith
@ 2011-12-29  7:22         ` Mike Galbraith
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Galbraith @ 2011-12-29  7:22 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Wed, 2011-12-28 at 14:35 +0100, Arjan van de Ven wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/28/2011 6:17 AM, Mike Galbraith wrote:
> > On Tue, 2011-12-27 at 10:20 +0100, Mike Galbraith wrote:
> >> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 
> >> Historically, Linux has tried to make the regular timer tick on
> >> the various CPUs not happen at the same time, to avoid contention
> >> on xtime_lock.
> >> 
> >> Nowadays, with the tickless kernel, this contention no longer
> >> happens since time keeping and updating are done differently. In
> >> addition, this skew is actually hurting power consumption in a
> >> measurable way on many-core systems. End quote
> > 
> > Hm, nohz enabled, hogs burning up 60 of 64 cores.
> > 
> > 56.11%  [kernel]      [k] ktime_get 5.54%  [kernel]      [k]
> > scheduler_tick 4.02%  [kernel]      [k] cpuacct_charge 3.78%
> > [kernel]      [k] __rcu_pending 3.76%  [kernel]      [k]
> > tick_sched_timer 3.42%  [kernel]      [k] native_write_msr_safe 
> > 1.58%  [kernel]      [k] run_timer_softirq 1.28%  [kernel]      [k]
> > __schedule 1.21%  [kernel]      [k] apic_timer_interrupt 1.07%
> > [kernel]      [k] _raw_spin_lock 0.81%  [kernel]      [k]
> > __switch_to 0.67%  [kernel]      [k] thread_return
> > 
> > Maybe skew-me wants to become a boot option?
> 
> this is 56% of kernel time.. of how much total time?

To answer the question..

 99.57%  burn                        [.] main
  0.14%  [kernel]                    [k] ktime_get

That's the DL980 running a 250Hz kernel.  Dinky, but for my picky RT
load, too much nonetheless.  (hm, what would SGI monster box say?)

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [patch] clockevents: Reinstate the per cpu tick skew
  2011-12-28 15:10       ` Mike Galbraith
@ 2012-01-03  6:20         ` Mike Galbraith
  2012-04-23  6:13           ` irq latency regression post af5ab277 - was " Mike Galbraith
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Galbraith @ 2012-01-03  6:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Wed, 2011-12-28 at 16:10 +0100, Mike Galbraith wrote:
> On Wed, 2011-12-28 at 14:32 +0100, Arjan van de Ven wrote:
> > 
> > I think we need to just say no to this, and kill the nohz=off option
> > entirely.
> > 
> > Seriously, are people still running with ticks for any legitimate
> > reasons? (and not just because they goofed their config file)
> 
> Yup.  Realtime loads sometimes need it.  Even without contention
> problems, entering/leaving nohz is a latency source.  If every little
> bit counts, you may have the choice of letting the electric meter spin
> or not getting the job done at all.

Patch making tick skew a boot option below, and hard numbers below that.

Test setup:
60 isolated cores running a synchronized frame scheduler model for 1
hour, scheduling worker-bees at three frequencies.  (The testcase is
supposed to "good enough" simulate a real frame rate scheduler, and did
pretty well at showing the cost of these particular collisions.)

First set of numbers is without tick skew, and nohz enabled.  Second set
is tick skewed, nohz and rt push/pull turned off for the isolated core
set.  The tick skew alone is responsible for an order of magnitude of
jitter improvement.  I have hard numbers for nohz and cpupri_set() as
well, but bottom line for me is that with nohz enabled, my 30us jitter
budget is nearly doubled, so even with the tick skewed, nohz is just not
a viable option ATM.


From: Mike Galbraith <mgalbraith@suse.de>

clockevents: Reinstate the per cpu tick skew

Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
Historically, Linux has tried to make the regular timer tick on the
various CPUs not happen at the same time, to avoid contention on
xtime_lock.
    
Nowadays, with the tickless kernel, this contention no longer happens
since time keeping and updating are done differently. In addition,
this skew is actually hurting power consumption in a measurable way on
many-core systems.
End quote

Contrary to the above, contention does still happen, and can be a
problem for realtime loads whether nohz is active or not, so give
the user the ability to decide whether power consumption or jitter
is the more important consideration.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>

---
 Documentation/kernel-parameters.txt |    3 +++
 kernel/time/tick-sched.c            |   19 +++++++++++++++++++
 2 files changed, 22 insertions(+)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2295,6 +2295,9 @@ bytes respectively. Such letter suffixes
 	simeth=		[IA-64]
 	simscsi=
 
+	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
+			xtime_lock contention on larger systems.
+
 	slram=		[HW,MTD]
 
 	slub_debug[=options[,slabs]]	[MM, SLUB]
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -759,6 +759,8 @@ static enum hrtimer_restart tick_sched_t
 	return HRTIMER_RESTART;
 }
 
+static int sched_skew_tick;
+
 /**
  * tick_setup_sched_timer - setup the tick emulation timer
  */
@@ -777,6 +779,14 @@ void tick_setup_sched_timer(void)
 	/* Get the next period (per cpu) */
 	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
 
+	/* Offset the tick to avert xtime_lock contention. */
+	if (sched_skew_tick) {
+		u64 offset = ktime_to_ns(tick_period) >> 1;
+		do_div(offset, num_possible_cpus());
+		offset *= smp_processor_id();
+		hrtimer_add_expires_ns(&ts->sched_timer, offset);
+	}
+
 	for (;;) {
 		hrtimer_forward(&ts->sched_timer, now, tick_period);
 		hrtimer_start_expires(&ts->sched_timer,
@@ -858,3 +868,12 @@ int tick_check_oneshot_change(int allow_
 	tick_nohz_switch_to_nohz();
 	return 0;
 }
+
+static int __init skew_tick(char *str)
+{
+	get_option(&str, &sched_skew_tick);
+
+	return 0;
+}
+early_param("skew_tick", skew_tick);
+

No skewed tick, nohz active:
FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   3456000   0.0159  51.51 (1751285) 1.0811  2.3215    0 (0)     940 (2496,2497,36625,36626,45649,..3438632)
5   3456000   0.0159  57.44 (1301949) 1.1164  2.3599    0 (0)     1010 (32353,32354,36625,36626,43681,..3434312)
6   3456000   0.0159  49.58 (546753)  1.0602  2.3222    0 (0)     1037 (32353,32354,36625,36626,41809,..3425240)
7   3456000   0.0159  52.20 (546753)  1.0681  2.3370    0 (0)     1035 (32353,32354,36625,36626,41809,..3432248)
8   3456000   0.0159  58.91 (1407504) 1.0592  2.0873    0 (0)     865 (11041,11042,15505,15506,25585,..3412208)
9   3456000   0.0159  54.61 (1407504) 1.0581  2.0775    0 (0)     850 (11041,11042,15505,15506,20234,..3411272)
10  3456000   0.0159  52.91 (1338694) 1.1259  2.0825    0 (0)     799 (11041,11042,15505,15506,16465,..3400640)
11  3456000   0.0159  50.56 (2470554) 1.1881  2.0364    0 (0)     334 (50714,113715,113716,166349,178780,..3421185)
12  3456000   0.0159  50.29 (2462200) 0.9961  2.0202    0 (0)     639 (9337,9338,11041,11042,15505,..3452529)
13  3456000   0.0159  56.52 (2470554) 1.1478  2.0602    0 (0)     400 (2545,2546,9121,9122,66434,..3440289)
14  3456000   0.0159  55.06 (34587)   1.2129  2.4890    0 (0)     444 (34587,34588,62571,62572,62619,..3440434)
15  3456000   0.0159  46.48 (583883)  1.2891  2.1824    0 (0)     306 (91563,95739,95740,141197,155741,..3406785)
16  3456000   0.0159  103.70 (2828662)2.1077  4.0380    410 (2)   9435 (697,698,1105,1106,1153,..3455937)
17  3456000   0.0159  73.89 (2470553) 2.1598  3.7529    0 (0)     6180 (2473,2474,3985,3986,8569,..3438201)
18  3456000   0.0159  54.14 (1212190) 2.2391  3.7075    0 (0)     5485 (10274,10275,13970,13971,14379,..3455794)
19  3456000   0.0159  99.20 (810712)  2.3861  4.5793    0 (0)     19845 (674,675,2259,2260,3554,..3455915)
20  3456000   0.0159  71.30 (631597)  2.2565  4.3141    0 (0)     9365 (674,675,3555,7394,7395,..3455914)
21  3456000   0.0159  71.51 (1431073) 2.3127  4.4810    0 (0)     25073 (1154,2259,2260,4011,4012,..3455963)
22  3456000   0.0159  62.45 (215262)  2.1318  4.3088    0 (0)     23570 (2259,2260,4011,4012,4539,..3455963)
23  3456000   0.0159  61.50 (212190)  2.1307  4.3165    0 (0)     23605 (2259,2260,4539,4540,5019,..3455963)
24  2397600   0.0587  145.26 (2229318)2.6808  6.2104    492 (14)  32977 (812,813,1145,1470,1471,..2397564)
25  2397600   0.0587  133.93 (250966) 2.6171  6.3300    492 (13)  35463 (812,813,1145,1146,1462,..2397564)
26  2397600   0.0587  140.25 (1405878)2.7079  6.1603    492 (12)  32428 (806,812,813,1145,1146,..2397564)
27  2397600   0.0587  141.56 (1405879)2.6893  6.1515    492 (14)  32089 (808,809,810,811,812,..2397564)
28  2397600   0.0587  146.57 (1405879)2.7129  6.0797    492 (14)  31637 (800,801,812,813,827,..2397564)
29  2397600   0.0587  137.99 (2172039)2.3360  5.9859    492 (14)  30551 (826,827,1157,1480,1481,..2397564)
30  2397600   0.0587  144.06 (948198) 2.2381  5.0413    496 (6)   19401 (826,827,832,833,1175,..2397566)
31  2397600   0.0587  141.92 (948198) 2.2509  5.0654    496 (4)   19353 (826,827,832,833,1175,..2397566)
32  2397600   0.0587  149.31 (2172038)2.7842  6.8891    492 (10)  41301 (822,823,824,825,826,..2397564)
33  2397600   0.0587  142.99 (1975198)2.6904  5.3538    181 (6)   21954 (511,512,846,847,1175,..2397582)
34  2397600   0.0587  167.07 (948199) 2.6350  5.6616    179 (4)   23602 (503,504,507,508,511,..2397582)
35  2397600   0.0587  79.81 (2152123) 2.5135  4.1781    0 (0)     5406 (1879,1881,1882,2876,2877,..2396956)
36  2397600   0.0587  112.24 (1184061)2.7419  5.3774    0 (0)     21005 (1185,1186,1189,1190,1518,..2397263)
37  2397600   0.0587  78.86 (986867)  2.6678  5.1954    0 (0)     19350 (529,530,861,863,1189,..2397263)
38  2397600   0.0587  77.90 (1782680) 2.5881  4.8399    0 (0)     13516 (525,526,529,530,860,..2396938)
39  2397600   0.0587  78.02 (1642135) 2.4351  3.8095    0 (0)     3569 (898,2900,2901,3561,3566,..2397291)
40  2397600   0.0587  218.81 (891116) 2.7215  6.6456    392 (8)   38961 (714,715,726,727,1046,..2397450)
41  2397600   0.0587  141.56 (1975198)2.6441  5.2995    181 (4)   22572 (846,847,1179,1180,1185,..2397249)
42  2397600   0.0587  77.07 (1782679) 2.3957  5.0119    0 (0)     17798 (529,530,860,861,862,..2397263)
43  2397600   0.0587  81.72 (1333323) 2.3469  4.5082    0 (0)     11172 (1205,1206,1207,1208,1865,..2396552)
44  1080000   0.0032  168.33 (988438) 2.7037  7.1729    381 (10)  20368 (650,651,662,663,809,..1056079)
45  1080000   0.0032  156.88 (935898) 2.6181  7.1047    0 (0)     19932 (767,768,809,810,866,..1022038)
46  1080000   0.0032  156.40 (935898) 2.2137  6.8080    0 (0)     18522 (684567,684568,695466,695467,699570,..975856)
47  1080000   0.0032  150.20 (905448) 2.6011  7.0525    0 (0)     19427 (2012,2013,510347,510348,617324,..980947)
48  1080000   0.0032  163.08 (1012102)3.0856  8.6857    491 (49)  32197 (527,528,536,537,545,..1059883)
49  1080000   0.0032  151.87 (861738) 2.1150  6.2499    0 (0)     14993 (679920,679921,681762,681763,684567,..889561)
50  1080000   0.0032  143.53 (843639) 2.3864  6.2304    0 (0)     14372 (673311,673312,676716,676717,679680,..907048)
51  1080000   0.0032  148.53 (815289) 2.4022  6.1284    0 (0)     13945 (667971,667972,672835,673311,673312,..925077)
52  1080000   0.0032  149.49 (815289) 2.4059  6.0745    0 (0)     13932 (667971,667972,672834,672835,673311,..925077)
53  1080000   0.0032  149.49 (788680) 2.2976  5.4171    0 (0)     10821 (662766,662767,664794,664795,667971,..851374)
54  1080000   0.0032  146.63 (788680) 2.1600  5.5494    0 (0)     11435 (662766,662767,664794,664795,667971,..925077)
55  1080000   0.0032  145.91 (817180) 2.3747  5.9131    0 (0)     13198 (664794,664795,667971,667972,672834,..925077)
56  1080000   0.0032  140.91 (788680) 2.4499  5.8216    0 (0)     13403 (641917,658567,662767,664794,664795,..925077)
57  1080000   0.0032  141.38 (707776) 1.2948  3.8831    0 (0)     5041 (654816,654817,658320,658321,658566,..757666)
58  1080000   0.0032  149.73 (707776) 1.2131  3.6946    0 (0)     4076 (641916,641917,654136,654816,654817,..739225)
59  1080000   0.0032  51.02 (220341)  1.3073  3.1542    0 (0)     1869 (138187,145140,145141,147822,147823,..1021026)
60  1080000   0.0032  119.93 (313205) 1.6518  5.2116    0 (0)     9504 (3019,3020,12955,12956,25645,..1078275)
61  1080000   0.0032  149.25 (707776) 1.2933  3.5546    0 (0)     3393 (631761,631762,641916,641917,647521,..732562)
62  1080000   0.0032  126.60 (222973) 2.0194  5.6079    0 (0)     11357 (3019,3020,12955,12956,14420,..1078275)
63  1080000   0.0032  126.60 (222973) 2.0223  5.6224    0 (0)     11452 (3019,3020,12955,12956,14420,..1078275)

Same kernel, tick skew enabled, nohz and push/pull (100% pinned load...)
disabled for the isolated cpuset.  This is 10us or so better than 33-rt
can do on this box with nohz=off, ie that's roughly the jitter that
cpupri_set() induces (_can_ double that very rarely it seems).

So with a couple little tweaks, 3.0-rt performs better than 33-rt (and
can dynamically become "green" again when not running picky rt load)
despite being a little fatter.  'Course if I applied the same dinky
tweaks to 33-rt, the weight gain would show.  Anyway, the numbers..

FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
on your marks... get set... POW!
Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
4   3456000   0.0159  5.98 (1957035)  0.1275  0.2979    0 (0)     
5   3456000   0.0159  6.21 (2641598)  0.2173  0.3444    0 (0)     
6   3456000   0.0159  5.26 (1313825)  0.1599  0.2956    0 (0)     
7   3456000   0.0159  5.98 (346106)   0.1632  0.2877    0 (0)     
8   3456000   0.0159  5.50 (70893)    0.1437  0.3450    0 (0)     
9   3456000   0.0159  5.98 (1550901)  0.1381  0.3502    0 (0)     
10  3456000   0.0159  5.74 (106100)   0.1478  0.3313    0 (0)     
11  3456000   0.0159  5.71 (3174550)  0.1413  0.3090    0 (0)     
12  3456000   0.0159  5.02 (1506694)  0.1761  0.3098    0 (0)     
13  3456000   0.0159  5.71 (3054611)  0.1768  0.3546    0 (0)     
14  3456000   0.0159  5.02 (3148871)  0.1299  0.3062    0 (0)     
15  3456000   0.0159  4.99 (2122036)  0.1521  0.3132    0 (0)     
16  3456000   0.0159  6.42 (1728959)  0.1521  0.3905    0 (0)     
17  3456000   0.0159  6.21 (854434)   0.1618  0.3652    0 (0)     
18  3456000   0.0159  6.93 (2190440)  0.1418  0.3548    0 (0)     
19  3456000   0.0159  6.90 (1614252)  0.2075  0.4128    0 (0)     
20  3456000   0.0159  5.47 (136316)   0.2002  0.3977    0 (0)     
21  3456000   0.0159  6.69 (1057262)  0.1435  0.3475    0 (0)     
22  3456000   0.0159  6.66 (3123382)  0.1602  0.3585    0 (0)     
23  3456000   0.0159  5.94 (2297025)  0.2283  0.3616    0 (0)     
24  2397600   0.0587  6.38 (991357)   0.2580  0.3817    0 (0)     
25  2397600   0.0587  6.73 (1162518)  0.2380  0.3730    0 (0)     
26  2397600   0.0587  7.21 (733474)   0.2502  0.3590    0 (0)     
27  2397600   0.0587  6.86 (1873716)  0.2280  0.3768    0 (0)     
28  2397600   0.0587  7.21 (2296767)  0.2521  0.3884    0 (0)     
29  2397600   0.0587  7.21 (616888)   0.4165  0.4887    0 (0)     
30  2397600   0.0587  7.09 (458995)   0.4245  0.4577    0 (0)     
31  2397600   0.0587  6.14 (1674893)  0.3974  0.4544    0 (0)     
32  2397600   0.0587  7.45 (130233)   0.4440  0.5456    0 (0)     
33  2397600   0.0587  7.09 (1453350)  0.2482  0.3813    0 (0)     
34  2397600   0.0587  6.73 (2365066)  0.2886  0.3827    0 (0)     
35  2397600   0.0587  6.14 (35955)    0.2556  0.3841    0 (0)     
36  2397600   0.0587  6.62 (2145554)  0.2566  0.3933    0 (0)     
37  2397600   0.0587  7.81 (130234)   0.5375  0.5129    0 (0)     
38  2397600   0.0587  7.33 (130234)   0.4921  0.5255    0 (0)     
39  2397600   0.0587  7.57 (130234)   0.4200  0.4901    0 (0)     
40  2397600   0.0587  6.62 (2367859)  0.2962  0.4553    0 (0)     
41  2397600   0.0587  6.26 (206979)   0.5036  0.5491    0 (0)     
42  2397600   0.0587  6.38 (1302660)  0.5093  0.5469    0 (0)     
43  2397600   0.0587  6.73 (1825681)  0.5511  0.5734    0 (0)     
44  1079999   0.0032  7.39 (91927)    0.4603  0.5291    0 (0)     
45  1079999   0.0032  6.92 (977865)   0.3143  0.4378    0 (0)     
46  1079999   0.0032  5.96 (1002473)  0.2129  0.3999    0 (0)     
47  1079999   0.0032  6.44 (981423)   0.4193  0.5293    0 (0)     
48  1079999   0.0032  6.20 (375165)   0.2602  0.4201    0 (0)     
49  1079999   0.0032  5.73 (886536)   0.4002  0.5174    0 (0)     
50  1079999   0.0032  6.44 (547629)   0.3182  0.4507    0 (0)     
51  1079999   0.0032  5.73 (143994)   0.4736  0.5952    0 (0)     
52  1079999   0.0032  6.68 (1053525)  0.4753  0.5132    0 (0)     
53  1079999   0.0032  6.44 (378576)   0.3686  0.4691    0 (0)     
54  1079999   0.0032  6.92 (886639)   0.6017  0.5538    0 (0)     
55  1079999   0.0032  6.68 (1055655)  0.4917  0.5232    0 (0)     
56  1079999   0.0032  6.44 (293526)   0.2752  0.4340    0 (0)     
57  1079999   0.0032  8.59 (913209)   1.1433  0.8550    0 (0)     
58  1079999   0.0032  5.25 (259824)   0.2139  0.3702    0 (0)     
59  1079999   0.0032  6.68 (245211)   0.2031  0.3665    0 (0)     
60  1079999   0.0032  6.44 (895440)   0.4445  0.4867    0 (0)     
61  1079999   0.0032  5.96 (896382)   0.2541  0.3923    0 (0)     
62  1079999   0.0032  7.16 (895440)   0.5437  0.5162    0 (0)     
63  1079999   0.0032  6.44 (895371)   0.5707  0.5135    0 (0)

So IMHO there is a valid case for keeping NO_HZ a config option for
folks who can never tolerate the pricetag, but as for the nohz=off
option, methinks that could indeed go away, given it's easy to make an
on/off switch.  I made one for both nohz and push/pull, just need to
move it into cpusets and make it pretty enough to live.

WRT $subject, it seems pretty clear that the RT kernel either wants tick
skew back.. or collision avoidance radar.. or something.

	-Mike


^ permalink raw reply	[flat|nested] 17+ messages in thread

* irq latency regression post af5ab277 - was Re: [patch] clockevents: Reinstate the per cpu tick skew
  2012-01-03  6:20         ` Mike Galbraith
@ 2012-04-23  6:13           ` Mike Galbraith
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Galbraith @ 2012-04-23  6:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: RT, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	LKML, Paul E. McKenney, Dimitri Sivanich

Greetings,

On Tue, 2012-01-03 at 07:20 +0100, Mike Galbraith wrote: 
> On Wed, 2011-12-28 at 16:10 +0100, Mike Galbraith wrote:
> > On Wed, 2011-12-28 at 14:32 +0100, Arjan van de Ven wrote:
> > > 
> > > I think we need to just say no to this, and kill the nohz=off option
> > > entirely.
> > > 
> > > Seriously, are people still running with ticks for any legitimate
> > > reasons? (and not just because they goofed their config file)
> > 
> > Yup.  Realtime loads sometimes need it.  Even without contention
> > problems, entering/leaving nohz is a latency source.  If every little
> > bit counts, you may have the choice of letting the electric meter spin
> > or not getting the job done at all.

There are other facets to tick skew removal that have turned up while
looking into an irq latency regression 2.6.32->3.0.  Not only does skew
removal induce jitter woes for moderate sized boxen running RT kernels,
it's a jitter source for large machines in general. 

More interestingly, that skew removal also appears to be indirectly
responsible for a rather large irq latency regression.  I bisected the
source of same to..

0209f649 rcu: limit rcu_node leaf-level fanout

.._but_, the source of the lock contention it addressed appears to be
the very tick skew removal that caused my xtime_lock jitter woes in RT.
Revert 0209f649 in CONFIG_MAXSMP CONFIG_PREEMPT_NONE kernel, contention
appears, restore skew, it disappears virtually entirely.  So it would
appear that we induced a ~400% latency regression to combat contention
that was itself induced by tick skew removal.

In enterprise, I can revert 0209f649 and enable tick skew across the
board instead of selectively, and kill the regression at the cost of
losing whatever power savings killing skew brought us.  May have to do
that.  In another thread, Paul suggested limiting GP initialization to
CPUs that have been online, which indeed turned the regression into a
modest progression.  That's highly attractive long term, but doing that
in a stable kernel before it's baked in mainline is not the least bit
attractive.  Hohum, rock or hard spot, pick one.

Anyway, I thought I should summarize the linkage of RCU induced latency
regression to tick skew removal.  Seems likely I'm not the only sod who
will have this land in their bug list.

> Patch making tick skew a boot option below, and hard numbers below that.
> 
> Test setup:
> 60 isolated cores running a synchronized frame scheduler model for 1
> hour, scheduling worker-bees at three frequencies.  (The testcase is
> supposed to "good enough" simulate a real frame rate scheduler, and did
> pretty well at showing the cost of these particular collisions.)
> 
> First set of numbers is without tick skew, and nohz enabled.  Second set
> is tick skewed, nohz and rt push/pull turned off for the isolated core
> set.  The tick skew alone is responsible for an order of magnitude of
> jitter improvement.  I have hard numbers for nohz and cpupri_set() as
> well, but bottom line for me is that with nohz enabled, my 30us jitter
> budget is nearly doubled, so even with the tick skewed, nohz is just not
> a viable option ATM.
> 
> 
> From: Mike Galbraith <mgalbraith@suse.de>
> 
> clockevents: Reinstate the per cpu tick skew
> 
> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
> Historically, Linux has tried to make the regular timer tick on the
> various CPUs not happen at the same time, to avoid contention on
> xtime_lock.
>     
> Nowadays, with the tickless kernel, this contention no longer happens
> since time keeping and updating are done differently. In addition,
> this skew is actually hurting power consumption in a measurable way on
> many-core systems.
> End quote
> 
> Contrary to the above, contention does still happen, and can be a
> problem for realtime loads whether nohz is active or not, so give
> the user the ability to decide whether power consumption or jitter
> is the more important consideration.
> 
> Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> 
> ---
>  Documentation/kernel-parameters.txt |    3 +++
>  kernel/time/tick-sched.c            |   19 +++++++++++++++++++
>  2 files changed, 22 insertions(+)
> 
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2295,6 +2295,9 @@ bytes respectively. Such letter suffixes
>  	simeth=		[IA-64]
>  	simscsi=
>  
> +	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
> +			xtime_lock contention on larger systems.
> +
>  	slram=		[HW,MTD]
>  
>  	slub_debug[=options[,slabs]]	[MM, SLUB]
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -759,6 +759,8 @@ static enum hrtimer_restart tick_sched_t
>  	return HRTIMER_RESTART;
>  }
>  
> +static int sched_skew_tick;
> +
>  /**
>   * tick_setup_sched_timer - setup the tick emulation timer
>   */
> @@ -777,6 +779,14 @@ void tick_setup_sched_timer(void)
>  	/* Get the next period (per cpu) */
>  	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
>  
> +	/* Offset the tick to avert xtime_lock contention. */
> +	if (sched_skew_tick) {
> +		u64 offset = ktime_to_ns(tick_period) >> 1;
> +		do_div(offset, num_possible_cpus());
> +		offset *= smp_processor_id();
> +		hrtimer_add_expires_ns(&ts->sched_timer, offset);
> +	}
> +
>  	for (;;) {
>  		hrtimer_forward(&ts->sched_timer, now, tick_period);
>  		hrtimer_start_expires(&ts->sched_timer,
> @@ -858,3 +868,12 @@ int tick_check_oneshot_change(int allow_
>  	tick_nohz_switch_to_nohz();
>  	return 0;
>  }
> +
> +static int __init skew_tick(char *str)
> +{
> +	get_option(&str, &sched_skew_tick);
> +
> +	return 0;
> +}
> +early_param("skew_tick", skew_tick);
> +
> 
> No skewed tick, nohz active:
> FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
> FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
> FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   3456000   0.0159  51.51 (1751285) 1.0811  2.3215    0 (0)     940 (2496,2497,36625,36626,45649,..3438632)
> 5   3456000   0.0159  57.44 (1301949) 1.1164  2.3599    0 (0)     1010 (32353,32354,36625,36626,43681,..3434312)
> 6   3456000   0.0159  49.58 (546753)  1.0602  2.3222    0 (0)     1037 (32353,32354,36625,36626,41809,..3425240)
> 7   3456000   0.0159  52.20 (546753)  1.0681  2.3370    0 (0)     1035 (32353,32354,36625,36626,41809,..3432248)
> 8   3456000   0.0159  58.91 (1407504) 1.0592  2.0873    0 (0)     865 (11041,11042,15505,15506,25585,..3412208)
> 9   3456000   0.0159  54.61 (1407504) 1.0581  2.0775    0 (0)     850 (11041,11042,15505,15506,20234,..3411272)
> 10  3456000   0.0159  52.91 (1338694) 1.1259  2.0825    0 (0)     799 (11041,11042,15505,15506,16465,..3400640)
> 11  3456000   0.0159  50.56 (2470554) 1.1881  2.0364    0 (0)     334 (50714,113715,113716,166349,178780,..3421185)
> 12  3456000   0.0159  50.29 (2462200) 0.9961  2.0202    0 (0)     639 (9337,9338,11041,11042,15505,..3452529)
> 13  3456000   0.0159  56.52 (2470554) 1.1478  2.0602    0 (0)     400 (2545,2546,9121,9122,66434,..3440289)
> 14  3456000   0.0159  55.06 (34587)   1.2129  2.4890    0 (0)     444 (34587,34588,62571,62572,62619,..3440434)
> 15  3456000   0.0159  46.48 (583883)  1.2891  2.1824    0 (0)     306 (91563,95739,95740,141197,155741,..3406785)
> 16  3456000   0.0159  103.70 (2828662)2.1077  4.0380    410 (2)   9435 (697,698,1105,1106,1153,..3455937)
> 17  3456000   0.0159  73.89 (2470553) 2.1598  3.7529    0 (0)     6180 (2473,2474,3985,3986,8569,..3438201)
> 18  3456000   0.0159  54.14 (1212190) 2.2391  3.7075    0 (0)     5485 (10274,10275,13970,13971,14379,..3455794)
> 19  3456000   0.0159  99.20 (810712)  2.3861  4.5793    0 (0)     19845 (674,675,2259,2260,3554,..3455915)
> 20  3456000   0.0159  71.30 (631597)  2.2565  4.3141    0 (0)     9365 (674,675,3555,7394,7395,..3455914)
> 21  3456000   0.0159  71.51 (1431073) 2.3127  4.4810    0 (0)     25073 (1154,2259,2260,4011,4012,..3455963)
> 22  3456000   0.0159  62.45 (215262)  2.1318  4.3088    0 (0)     23570 (2259,2260,4011,4012,4539,..3455963)
> 23  3456000   0.0159  61.50 (212190)  2.1307  4.3165    0 (0)     23605 (2259,2260,4539,4540,5019,..3455963)
> 24  2397600   0.0587  145.26 (2229318)2.6808  6.2104    492 (14)  32977 (812,813,1145,1470,1471,..2397564)
> 25  2397600   0.0587  133.93 (250966) 2.6171  6.3300    492 (13)  35463 (812,813,1145,1146,1462,..2397564)
> 26  2397600   0.0587  140.25 (1405878)2.7079  6.1603    492 (12)  32428 (806,812,813,1145,1146,..2397564)
> 27  2397600   0.0587  141.56 (1405879)2.6893  6.1515    492 (14)  32089 (808,809,810,811,812,..2397564)
> 28  2397600   0.0587  146.57 (1405879)2.7129  6.0797    492 (14)  31637 (800,801,812,813,827,..2397564)
> 29  2397600   0.0587  137.99 (2172039)2.3360  5.9859    492 (14)  30551 (826,827,1157,1480,1481,..2397564)
> 30  2397600   0.0587  144.06 (948198) 2.2381  5.0413    496 (6)   19401 (826,827,832,833,1175,..2397566)
> 31  2397600   0.0587  141.92 (948198) 2.2509  5.0654    496 (4)   19353 (826,827,832,833,1175,..2397566)
> 32  2397600   0.0587  149.31 (2172038)2.7842  6.8891    492 (10)  41301 (822,823,824,825,826,..2397564)
> 33  2397600   0.0587  142.99 (1975198)2.6904  5.3538    181 (6)   21954 (511,512,846,847,1175,..2397582)
> 34  2397600   0.0587  167.07 (948199) 2.6350  5.6616    179 (4)   23602 (503,504,507,508,511,..2397582)
> 35  2397600   0.0587  79.81 (2152123) 2.5135  4.1781    0 (0)     5406 (1879,1881,1882,2876,2877,..2396956)
> 36  2397600   0.0587  112.24 (1184061)2.7419  5.3774    0 (0)     21005 (1185,1186,1189,1190,1518,..2397263)
> 37  2397600   0.0587  78.86 (986867)  2.6678  5.1954    0 (0)     19350 (529,530,861,863,1189,..2397263)
> 38  2397600   0.0587  77.90 (1782680) 2.5881  4.8399    0 (0)     13516 (525,526,529,530,860,..2396938)
> 39  2397600   0.0587  78.02 (1642135) 2.4351  3.8095    0 (0)     3569 (898,2900,2901,3561,3566,..2397291)
> 40  2397600   0.0587  218.81 (891116) 2.7215  6.6456    392 (8)   38961 (714,715,726,727,1046,..2397450)
> 41  2397600   0.0587  141.56 (1975198)2.6441  5.2995    181 (4)   22572 (846,847,1179,1180,1185,..2397249)
> 42  2397600   0.0587  77.07 (1782679) 2.3957  5.0119    0 (0)     17798 (529,530,860,861,862,..2397263)
> 43  2397600   0.0587  81.72 (1333323) 2.3469  4.5082    0 (0)     11172 (1205,1206,1207,1208,1865,..2396552)
> 44  1080000   0.0032  168.33 (988438) 2.7037  7.1729    381 (10)  20368 (650,651,662,663,809,..1056079)
> 45  1080000   0.0032  156.88 (935898) 2.6181  7.1047    0 (0)     19932 (767,768,809,810,866,..1022038)
> 46  1080000   0.0032  156.40 (935898) 2.2137  6.8080    0 (0)     18522 (684567,684568,695466,695467,699570,..975856)
> 47  1080000   0.0032  150.20 (905448) 2.6011  7.0525    0 (0)     19427 (2012,2013,510347,510348,617324,..980947)
> 48  1080000   0.0032  163.08 (1012102)3.0856  8.6857    491 (49)  32197 (527,528,536,537,545,..1059883)
> 49  1080000   0.0032  151.87 (861738) 2.1150  6.2499    0 (0)     14993 (679920,679921,681762,681763,684567,..889561)
> 50  1080000   0.0032  143.53 (843639) 2.3864  6.2304    0 (0)     14372 (673311,673312,676716,676717,679680,..907048)
> 51  1080000   0.0032  148.53 (815289) 2.4022  6.1284    0 (0)     13945 (667971,667972,672835,673311,673312,..925077)
> 52  1080000   0.0032  149.49 (815289) 2.4059  6.0745    0 (0)     13932 (667971,667972,672834,672835,673311,..925077)
> 53  1080000   0.0032  149.49 (788680) 2.2976  5.4171    0 (0)     10821 (662766,662767,664794,664795,667971,..851374)
> 54  1080000   0.0032  146.63 (788680) 2.1600  5.5494    0 (0)     11435 (662766,662767,664794,664795,667971,..925077)
> 55  1080000   0.0032  145.91 (817180) 2.3747  5.9131    0 (0)     13198 (664794,664795,667971,667972,672834,..925077)
> 56  1080000   0.0032  140.91 (788680) 2.4499  5.8216    0 (0)     13403 (641917,658567,662767,664794,664795,..925077)
> 57  1080000   0.0032  141.38 (707776) 1.2948  3.8831    0 (0)     5041 (654816,654817,658320,658321,658566,..757666)
> 58  1080000   0.0032  149.73 (707776) 1.2131  3.6946    0 (0)     4076 (641916,641917,654136,654816,654817,..739225)
> 59  1080000   0.0032  51.02 (220341)  1.3073  3.1542    0 (0)     1869 (138187,145140,145141,147822,147823,..1021026)
> 60  1080000   0.0032  119.93 (313205) 1.6518  5.2116    0 (0)     9504 (3019,3020,12955,12956,25645,..1078275)
> 61  1080000   0.0032  149.25 (707776) 1.2933  3.5546    0 (0)     3393 (631761,631762,641916,641917,647521,..732562)
> 62  1080000   0.0032  126.60 (222973) 2.0194  5.6079    0 (0)     11357 (3019,3020,12955,12956,14420,..1078275)
> 63  1080000   0.0032  126.60 (222973) 2.0223  5.6224    0 (0)     11452 (3019,3020,12955,12956,14420,..1078275)
> 
> Same kernel, tick skew enabled, nohz and push/pull (100% pinned load...)
> disabled for the isolated cpuset.  This is 10us or so better than 33-rt
> can do on this box with nohz=off, ie that's roughly the jitter that
> cpupri_set() induces (_can_ double that very rarely it seems).
> 
> So with a couple little tweaks, 3.0-rt performs better than 33-rt (and
> can dynamically become "green" again when not running picky rt load)
> despite being a little fatter.  'Course if I applied the same dinky
> tweaks to 33-rt, the weight gain would show.  Anyway, the numbers..
> 
> FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
> FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
> FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   3456000   0.0159  5.98 (1957035)  0.1275  0.2979    0 (0)     
> 5   3456000   0.0159  6.21 (2641598)  0.2173  0.3444    0 (0)     
> 6   3456000   0.0159  5.26 (1313825)  0.1599  0.2956    0 (0)     
> 7   3456000   0.0159  5.98 (346106)   0.1632  0.2877    0 (0)     
> 8   3456000   0.0159  5.50 (70893)    0.1437  0.3450    0 (0)     
> 9   3456000   0.0159  5.98 (1550901)  0.1381  0.3502    0 (0)     
> 10  3456000   0.0159  5.74 (106100)   0.1478  0.3313    0 (0)     
> 11  3456000   0.0159  5.71 (3174550)  0.1413  0.3090    0 (0)     
> 12  3456000   0.0159  5.02 (1506694)  0.1761  0.3098    0 (0)     
> 13  3456000   0.0159  5.71 (3054611)  0.1768  0.3546    0 (0)     
> 14  3456000   0.0159  5.02 (3148871)  0.1299  0.3062    0 (0)     
> 15  3456000   0.0159  4.99 (2122036)  0.1521  0.3132    0 (0)     
> 16  3456000   0.0159  6.42 (1728959)  0.1521  0.3905    0 (0)     
> 17  3456000   0.0159  6.21 (854434)   0.1618  0.3652    0 (0)     
> 18  3456000   0.0159  6.93 (2190440)  0.1418  0.3548    0 (0)     
> 19  3456000   0.0159  6.90 (1614252)  0.2075  0.4128    0 (0)     
> 20  3456000   0.0159  5.47 (136316)   0.2002  0.3977    0 (0)     
> 21  3456000   0.0159  6.69 (1057262)  0.1435  0.3475    0 (0)     
> 22  3456000   0.0159  6.66 (3123382)  0.1602  0.3585    0 (0)     
> 23  3456000   0.0159  5.94 (2297025)  0.2283  0.3616    0 (0)     
> 24  2397600   0.0587  6.38 (991357)   0.2580  0.3817    0 (0)     
> 25  2397600   0.0587  6.73 (1162518)  0.2380  0.3730    0 (0)     
> 26  2397600   0.0587  7.21 (733474)   0.2502  0.3590    0 (0)     
> 27  2397600   0.0587  6.86 (1873716)  0.2280  0.3768    0 (0)     
> 28  2397600   0.0587  7.21 (2296767)  0.2521  0.3884    0 (0)     
> 29  2397600   0.0587  7.21 (616888)   0.4165  0.4887    0 (0)     
> 30  2397600   0.0587  7.09 (458995)   0.4245  0.4577    0 (0)     
> 31  2397600   0.0587  6.14 (1674893)  0.3974  0.4544    0 (0)     
> 32  2397600   0.0587  7.45 (130233)   0.4440  0.5456    0 (0)     
> 33  2397600   0.0587  7.09 (1453350)  0.2482  0.3813    0 (0)     
> 34  2397600   0.0587  6.73 (2365066)  0.2886  0.3827    0 (0)     
> 35  2397600   0.0587  6.14 (35955)    0.2556  0.3841    0 (0)     
> 36  2397600   0.0587  6.62 (2145554)  0.2566  0.3933    0 (0)     
> 37  2397600   0.0587  7.81 (130234)   0.5375  0.5129    0 (0)     
> 38  2397600   0.0587  7.33 (130234)   0.4921  0.5255    0 (0)     
> 39  2397600   0.0587  7.57 (130234)   0.4200  0.4901    0 (0)     
> 40  2397600   0.0587  6.62 (2367859)  0.2962  0.4553    0 (0)     
> 41  2397600   0.0587  6.26 (206979)   0.5036  0.5491    0 (0)     
> 42  2397600   0.0587  6.38 (1302660)  0.5093  0.5469    0 (0)     
> 43  2397600   0.0587  6.73 (1825681)  0.5511  0.5734    0 (0)     
> 44  1079999   0.0032  7.39 (91927)    0.4603  0.5291    0 (0)     
> 45  1079999   0.0032  6.92 (977865)   0.3143  0.4378    0 (0)     
> 46  1079999   0.0032  5.96 (1002473)  0.2129  0.3999    0 (0)     
> 47  1079999   0.0032  6.44 (981423)   0.4193  0.5293    0 (0)     
> 48  1079999   0.0032  6.20 (375165)   0.2602  0.4201    0 (0)     
> 49  1079999   0.0032  5.73 (886536)   0.4002  0.5174    0 (0)     
> 50  1079999   0.0032  6.44 (547629)   0.3182  0.4507    0 (0)     
> 51  1079999   0.0032  5.73 (143994)   0.4736  0.5952    0 (0)     
> 52  1079999   0.0032  6.68 (1053525)  0.4753  0.5132    0 (0)     
> 53  1079999   0.0032  6.44 (378576)   0.3686  0.4691    0 (0)     
> 54  1079999   0.0032  6.92 (886639)   0.6017  0.5538    0 (0)     
> 55  1079999   0.0032  6.68 (1055655)  0.4917  0.5232    0 (0)     
> 56  1079999   0.0032  6.44 (293526)   0.2752  0.4340    0 (0)     
> 57  1079999   0.0032  8.59 (913209)   1.1433  0.8550    0 (0)     
> 58  1079999   0.0032  5.25 (259824)   0.2139  0.3702    0 (0)     
> 59  1079999   0.0032  6.68 (245211)   0.2031  0.3665    0 (0)     
> 60  1079999   0.0032  6.44 (895440)   0.4445  0.4867    0 (0)     
> 61  1079999   0.0032  5.96 (896382)   0.2541  0.3923    0 (0)     
> 62  1079999   0.0032  7.16 (895440)   0.5437  0.5162    0 (0)     
> 63  1079999   0.0032  6.44 (895371)   0.5707  0.5135    0 (0)
> 
> So IMHO there is a valid case for keeping NO_HZ a config option for
> folks who can never tolerate the pricetag, but as for the nohz=off
> option, methinks that could indeed go away, given it's easy to make an
> on/off switch.  I made one for both nohz and push/pull, just need to
> move it into cpusets and make it pretty enough to live.
> 
> WRT $subject, it seems pretty clear that the RT kernel either wants tick
> skew back.. or collision avoidance radar.. or something.
> 
> 	-Mike
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-04-23  6:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-24  9:06 3.0.14-rt31 + 64 cores = very bad jitter == highly synchronized tick? Mike Galbraith
2011-12-25  7:31 ` Mike Galbraith
2011-12-26  8:04   ` Mike Galbraith
2011-12-27  6:40 ` Mike Galbraith
2011-12-27  9:20   ` [patch] clockevents: Reinstate the per cpu tick skew Mike Galbraith
2011-12-28  5:17     ` Mike Galbraith
2011-12-28  8:22       ` Mike Galbraith
2011-12-28  9:59         ` Mike Galbraith
2011-12-28 13:35       ` Arjan van de Ven
2011-12-28 14:59         ` Mike Galbraith
2011-12-28 16:57           ` Peter Zijlstra
2011-12-28 17:28             ` Mike Galbraith
2011-12-29  7:22         ` Mike Galbraith
2011-12-28 13:32     ` Arjan van de Ven
2011-12-28 15:10       ` Mike Galbraith
2012-01-03  6:20         ` Mike Galbraith
2012-04-23  6:13           ` irq latency regression post af5ab277 - was " Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).