* dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only @ 2013-05-12 8:17 Mike Galbraith 2013-05-14 0:57 ` Frederic Weisbecker 0 siblings, 1 reply; 7+ messages in thread From: Mike Galbraith @ 2013-05-12 8:17 UTC (permalink / raw) To: LKML; +Cc: Frederic Weisbecker, Paul E. McKenney Greetings, Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks accrue zero utime/stime. However, the same exact kernel on E5620 box works fine, so it would appear there's a CPU dependency somewhere. Is core2 expected to go dysfunctional with context tracking enabled? CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 boxen only, same exact kernel continues to work just fine on E5620 (Westmere) box. -Mike marge:/usr/local/src/kernel/linux-3.9 # egrep 'NO_HR|CPU_ACCOUNTING|RCU| CONTEXT' .config CONFIG_VIRT_CPU_ACCOUNTING=y # CONFIG_TICK_CPU_ACCOUNTING is not set CONFIG_VIRT_CPU_ACCOUNTING_GEN=y # RCU Subsystem CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_CONTEXT_TRACKING=y # CONFIG_RCU_USER_QS is not set CONFIG_CONTEXT_TRACKING_FORCE=y CONFIG_RCU_FANOUT=64 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_NOCB_CPU is not set CONFIG_HAVE_CONTEXT_TRACKING=y # RCU Debugging # CONFIG_SPARSE_RCU_POINTER is not set # CONFIG_RCU_TORTURE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=21 # CONFIG_RCU_CPU_STALL_INFO is not set # CONFIG_RCU_TRACE is not set CONFIG_CONTEXT_SWITCH_TRACER=y ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-12 8:17 dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only Mike Galbraith @ 2013-05-14 0:57 ` Frederic Weisbecker 2013-05-14 7:37 ` Mike Galbraith 2013-05-14 14:07 ` Mike Galbraith 0 siblings, 2 replies; 7+ messages in thread From: Frederic Weisbecker @ 2013-05-14 0:57 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Paul E. McKenney On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > Greetings, > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > accrue zero utime/stime. However, the same exact kernel on E5620 box > works fine, so it would appear there's a CPU dependency somewhere. Ah indeed, I just managed to reproduce the same issue. > > Is core2 expected to go dysfunctional with context tracking enabled? > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > boxen only, same exact kernel continues to work just fine on E5620 > (Westmere) box. There was no known issue with core2. The box where I'm seeing the it is a Phenom quad core that had NR_CPUS=2. May be the issue is more likely to happen with this low number. I don't know. I'm investigating further. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-14 0:57 ` Frederic Weisbecker @ 2013-05-14 7:37 ` Mike Galbraith 2013-05-14 14:07 ` Mike Galbraith 1 sibling, 0 replies; 7+ messages in thread From: Mike Galbraith @ 2013-05-14 7:37 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > > Greetings, > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > > accrue zero utime/stime. However, the same exact kernel on E5620 box > > works fine, so it would appear there's a CPU dependency somewhere. > > Ah indeed, I just managed to reproduce the same issue. > > > > > Is core2 expected to go dysfunctional with context tracking enabled? > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > > boxen only, same exact kernel continues to work just fine on E5620 > > (Westmere) box. > > There was no known issue with core2. The box where I'm seeing the it > is a Phenom quad core that had NR_CPUS=2. May be the issue is more > likely to happen with this low number. I don't know. > > I'm investigating further. Me too. bash-6023 [001] d... 290.494214: vtime_delta: clock: 289702961236 vtime_snap: 290493017701 Always. Not good. I see.. current->vtime_snap = sched_clock(); and.. clock = local_clock(); Things that make ya go hmm. The below "fixes" it (not). diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index cc2dc3ee..3133665 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -634,14 +634,17 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -static unsigned long long vtime_delta(struct task_struct *tsk) +static noinline unsigned long long vtime_delta(struct task_struct *tsk) { unsigned long long clock; - clock = local_clock(); +// clock = local_clock(); + clock = sched_clock(); + trace_printk("clock: %Lu vtime_snap: %Lu\n", clock, tsk->vtime_snap); if (clock < tsk->vtime_snap) return 0; + trace_printk("clock: %Lu vtime_snap: %Lu returns :%Lu\n", clock, tsk->vtime_snap, clock - tsk->vtime_snap); return clock - tsk->vtime_snap; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-14 0:57 ` Frederic Weisbecker 2013-05-14 7:37 ` Mike Galbraith @ 2013-05-14 14:07 ` Mike Galbraith 2013-05-15 0:26 ` Frederic Weisbecker 1 sibling, 1 reply; 7+ messages in thread From: Mike Galbraith @ 2013-05-14 14:07 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > > Greetings, > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > > accrue zero utime/stime. However, the same exact kernel on E5620 box > > works fine, so it would appear there's a CPU dependency somewhere. > > Ah indeed, I just managed to reproduce the same issue. > > > > > Is core2 expected to go dysfunctional with context tracking enabled? > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > > boxen only, same exact kernel continues to work just fine on E5620 > > (Westmere) box. > > There was no known issue with core2. The box where I'm seeing the it > is a Phenom quad core that had NR_CPUS=2. May be the issue is more > likely to happen with this low number. I don't know. > > I'm investigating further. So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock() (pure tsc) with local_clock()/sched_clock_cpu(cpu). The former is always quite a bit ahead of the later, so mixing clocks is a nogo on crusty old (but beloved) core2 box. -Mike ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-14 14:07 ` Mike Galbraith @ 2013-05-15 0:26 ` Frederic Weisbecker 2013-05-15 4:09 ` Mike Galbraith 0 siblings, 1 reply; 7+ messages in thread From: Frederic Weisbecker @ 2013-05-15 0:26 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Paul E. McKenney On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote: > On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: > > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > > > Greetings, > > > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > > > accrue zero utime/stime. However, the same exact kernel on E5620 box > > > works fine, so it would appear there's a CPU dependency somewhere. > > > > Ah indeed, I just managed to reproduce the same issue. > > > > > > > > Is core2 expected to go dysfunctional with context tracking enabled? > > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > > > boxen only, same exact kernel continues to work just fine on E5620 > > > (Westmere) box. > > > > There was no known issue with core2. The box where I'm seeing the it > > is a Phenom quad core that had NR_CPUS=2. May be the issue is more > > likely to happen with this low number. I don't know. > > > > I'm investigating further. > > So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock() > (pure tsc) with local_clock()/sched_clock_cpu(cpu). The former is > always quite a bit ahead of the later, so mixing clocks is a nogo on > crusty old (but beloved) core2 box. Right I have the same issue. So let's use local_clock() everywhere here, it takes care of unstable tsc. Does the following fix the issue for you? diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index cc2dc3e..1ce322f 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -747,7 +748,7 @@ void arch_vtime_task_switch(struct task_struct *prev) write_seqlock(¤t->vtime_seqlock); current->vtime_snap_whence = VTIME_SYS; - current->vtime_snap = sched_clock(); + current->vtime_snap = local_clock(); write_sequnlock(¤t->vtime_seqlock); } @@ -757,7 +758,7 @@ void vtime_init_idle(struct task_struct *t) write_seqlock_irqsave(&t->vtime_seqlock, flags); t->vtime_snap_whence = VTIME_SYS; - t->vtime_snap = sched_clock(); + t->vtime_snap = local_clock(); write_sequnlock_irqrestore(&t->vtime_seqlock, flags); } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-15 0:26 ` Frederic Weisbecker @ 2013-05-15 4:09 ` Mike Galbraith 2013-05-15 16:05 ` Frederic Weisbecker 0 siblings, 1 reply; 7+ messages in thread From: Mike Galbraith @ 2013-05-15 4:09 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney On Wed, 2013-05-15 at 02:26 +0200, Frederic Weisbecker wrote: > On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote: > > On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: > > > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > > > > Greetings, > > > > > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > > > > accrue zero utime/stime. However, the same exact kernel on E5620 box > > > > works fine, so it would appear there's a CPU dependency somewhere. > > > > > > Ah indeed, I just managed to reproduce the same issue. > > > > > > > > > > > Is core2 expected to go dysfunctional with context tracking enabled? > > > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > > > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > > > > boxen only, same exact kernel continues to work just fine on E5620 > > > > (Westmere) box. > > > > > > There was no known issue with core2. The box where I'm seeing the it > > > is a Phenom quad core that had NR_CPUS=2. May be the issue is more > > > likely to happen with this low number. I don't know. > > > > > > I'm investigating further. > > > > So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock() > > (pure tsc) with local_clock()/sched_clock_cpu(cpu). The former is > > always quite a bit ahead of the later, so mixing clocks is a nogo on > > crusty old (but beloved) core2 box. > > Right I have the same issue. So let's use local_clock() everywhere here, > it takes care of unstable tsc. > > Does the following fix the issue for you? Yeah, both can use sched_clock_cpu() instead though. > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index cc2dc3e..1ce322f 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -747,7 +748,7 @@ void arch_vtime_task_switch(struct task_struct *prev) > > write_seqlock(¤t->vtime_seqlock); > current->vtime_snap_whence = VTIME_SYS; > - current->vtime_snap = sched_clock(); > + current->vtime_snap = local_clock(); > write_sequnlock(¤t->vtime_seqlock); > } > > @@ -757,7 +758,7 @@ void vtime_init_idle(struct task_struct *t) > > write_seqlock_irqsave(&t->vtime_seqlock, flags); > t->vtime_snap_whence = VTIME_SYS; > - t->vtime_snap = sched_clock(); > + t->vtime_snap = local_clock(); > write_sequnlock_irqrestore(&t->vtime_seqlock, flags); > } > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only 2013-05-15 4:09 ` Mike Galbraith @ 2013-05-15 16:05 ` Frederic Weisbecker 0 siblings, 0 replies; 7+ messages in thread From: Frederic Weisbecker @ 2013-05-15 16:05 UTC (permalink / raw) To: Mike Galbraith; +Cc: LKML, Paul E. McKenney On Wed, May 15, 2013 at 06:09:15AM +0200, Mike Galbraith wrote: > On Wed, 2013-05-15 at 02:26 +0200, Frederic Weisbecker wrote: > > On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote: > > > On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: > > > > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote: > > > > > Greetings, > > > > > > > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks > > > > > accrue zero utime/stime. However, the same exact kernel on E5620 box > > > > > works fine, so it would appear there's a CPU dependency somewhere. > > > > > > > > Ah indeed, I just managed to reproduce the same issue. > > > > > > > > > > > > > > Is core2 expected to go dysfunctional with context tracking enabled? > > > > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on > > > > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2 > > > > > boxen only, same exact kernel continues to work just fine on E5620 > > > > > (Westmere) box. > > > > > > > > There was no known issue with core2. The box where I'm seeing the it > > > > is a Phenom quad core that had NR_CPUS=2. May be the issue is more > > > > likely to happen with this low number. I don't know. > > > > > > > > I'm investigating further. > > > > > > So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock() > > > (pure tsc) with local_clock()/sched_clock_cpu(cpu). The former is > > > always quite a bit ahead of the later, so mixing clocks is a nogo on > > > crusty old (but beloved) core2 box. > > > > Right I have the same issue. So let's use local_clock() everywhere here, > > it takes care of unstable tsc. > > > > Does the following fix the issue for you? > > Yeah, both can use sched_clock_cpu() instead though. Right, given that irqs are already disabled. I'm preparing the patch. Thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-05-15 16:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-12 8:17 dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only Mike Galbraith 2013-05-14 0:57 ` Frederic Weisbecker 2013-05-14 7:37 ` Mike Galbraith 2013-05-14 14:07 ` Mike Galbraith 2013-05-15 0:26 ` Frederic Weisbecker 2013-05-15 4:09 ` Mike Galbraith 2013-05-15 16:05 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox