public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
@ 2013-05-12  8:17 Mike Galbraith
  2013-05-14  0:57 ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2013-05-12  8:17 UTC (permalink / raw)
  To: LKML; +Cc: Frederic Weisbecker, Paul E. McKenney

Greetings,

Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
accrue zero utime/stime.  However, the same exact kernel on E5620 box
works fine, so it would appear there's a CPU dependency somewhere.

Is core2 expected to go dysfunctional with context tracking enabled?
CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
boxen only, same exact kernel continues to work just fine on E5620
(Westmere) box.

-Mike 

marge:/usr/local/src/kernel/linux-3.9 # egrep 'NO_HR|CPU_ACCOUNTING|RCU|
CONTEXT' .config
CONFIG_VIRT_CPU_ACCOUNTING=y
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
# RCU Subsystem
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_RCU_USER_QS is not set
CONFIG_CONTEXT_TRACKING_FORCE=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_NOCB_CPU is not set
CONFIG_HAVE_CONTEXT_TRACKING=y
# RCU Debugging
# CONFIG_SPARSE_RCU_POINTER is not set
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
# CONFIG_RCU_CPU_STALL_INFO is not set
# CONFIG_RCU_TRACE is not set
CONFIG_CONTEXT_SWITCH_TRACER=y


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-12  8:17 dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only Mike Galbraith
@ 2013-05-14  0:57 ` Frederic Weisbecker
  2013-05-14  7:37   ` Mike Galbraith
  2013-05-14 14:07   ` Mike Galbraith
  0 siblings, 2 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-14  0:57 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Paul E. McKenney

On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> Greetings,
> 
> Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> accrue zero utime/stime.  However, the same exact kernel on E5620 box
> works fine, so it would appear there's a CPU dependency somewhere.

Ah indeed, I just managed to reproduce the same issue.

> 
> Is core2 expected to go dysfunctional with context tracking enabled?
> CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> boxen only, same exact kernel continues to work just fine on E5620
> (Westmere) box.

There was no known issue with core2. The box where I'm seeing the it
is a Phenom quad core that had NR_CPUS=2. May be the issue is more
likely to happen with this low number. I don't know.

I'm investigating further.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-14  0:57 ` Frederic Weisbecker
@ 2013-05-14  7:37   ` Mike Galbraith
  2013-05-14 14:07   ` Mike Galbraith
  1 sibling, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2013-05-14  7:37 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney

On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote:
> On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> > Greetings,
> > 
> > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> > accrue zero utime/stime.  However, the same exact kernel on E5620 box
> > works fine, so it would appear there's a CPU dependency somewhere.
> 
> Ah indeed, I just managed to reproduce the same issue.
> 
> > 
> > Is core2 expected to go dysfunctional with context tracking enabled?
> > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> > boxen only, same exact kernel continues to work just fine on E5620
> > (Westmere) box.
> 
> There was no known issue with core2. The box where I'm seeing the it
> is a Phenom quad core that had NR_CPUS=2. May be the issue is more
> likely to happen with this low number. I don't know.
> 
> I'm investigating further.

Me too.

bash-6023  [001] d...   290.494214: vtime_delta: clock: 289702961236 vtime_snap: 290493017701

Always.  Not good.

I see..

current->vtime_snap = sched_clock();

and..

clock = local_clock();

Things that make ya go hmm.  The below "fixes" it (not).

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index cc2dc3ee..3133665 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -634,14 +634,17 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static unsigned long long vtime_delta(struct task_struct *tsk)
+static noinline unsigned long long vtime_delta(struct task_struct *tsk)
 {
 	unsigned long long clock;
 
-	clock = local_clock();
+//	clock = local_clock();
+	clock = sched_clock();
+	trace_printk("clock: %Lu vtime_snap: %Lu\n", clock, tsk->vtime_snap);
 	if (clock < tsk->vtime_snap)
 		return 0;
 
+	trace_printk("clock: %Lu vtime_snap: %Lu returns :%Lu\n", clock, tsk->vtime_snap, clock - tsk->vtime_snap);
 	return clock - tsk->vtime_snap;
 }
 




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-14  0:57 ` Frederic Weisbecker
  2013-05-14  7:37   ` Mike Galbraith
@ 2013-05-14 14:07   ` Mike Galbraith
  2013-05-15  0:26     ` Frederic Weisbecker
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2013-05-14 14:07 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney

On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: 
> On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> > Greetings,
> > 
> > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> > accrue zero utime/stime.  However, the same exact kernel on E5620 box
> > works fine, so it would appear there's a CPU dependency somewhere.
> 
> Ah indeed, I just managed to reproduce the same issue.
> 
> > 
> > Is core2 expected to go dysfunctional with context tracking enabled?
> > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> > boxen only, same exact kernel continues to work just fine on E5620
> > (Westmere) box.
> 
> There was no known issue with core2. The box where I'm seeing the it
> is a Phenom quad core that had NR_CPUS=2. May be the issue is more
> likely to happen with this low number. I don't know.
> 
> I'm investigating further.

So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock()
(pure tsc) with local_clock()/sched_clock_cpu(cpu).  The former is
always quite a bit ahead of the later, so mixing clocks is a nogo on
crusty old (but beloved) core2 box.

-Mike


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-14 14:07   ` Mike Galbraith
@ 2013-05-15  0:26     ` Frederic Weisbecker
  2013-05-15  4:09       ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-15  0:26 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Paul E. McKenney

On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote:
> On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: 
> > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> > > Greetings,
> > > 
> > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> > > accrue zero utime/stime.  However, the same exact kernel on E5620 box
> > > works fine, so it would appear there's a CPU dependency somewhere.
> > 
> > Ah indeed, I just managed to reproduce the same issue.
> > 
> > > 
> > > Is core2 expected to go dysfunctional with context tracking enabled?
> > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> > > boxen only, same exact kernel continues to work just fine on E5620
> > > (Westmere) box.
> > 
> > There was no known issue with core2. The box where I'm seeing the it
> > is a Phenom quad core that had NR_CPUS=2. May be the issue is more
> > likely to happen with this low number. I don't know.
> > 
> > I'm investigating further.
> 
> So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock()
> (pure tsc) with local_clock()/sched_clock_cpu(cpu).  The former is
> always quite a bit ahead of the later, so mixing clocks is a nogo on
> crusty old (but beloved) core2 box.

Right I have the same issue. So let's use local_clock() everywhere here,
it takes care of unstable tsc.

Does the following fix the issue for you?

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index cc2dc3e..1ce322f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -747,7 +748,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
 
 	write_seqlock(&current->vtime_seqlock);
 	current->vtime_snap_whence = VTIME_SYS;
-	current->vtime_snap = sched_clock();
+	current->vtime_snap = local_clock();
 	write_sequnlock(&current->vtime_seqlock);
 }
 
@@ -757,7 +758,7 @@ void vtime_init_idle(struct task_struct *t)
 
 	write_seqlock_irqsave(&t->vtime_seqlock, flags);
 	t->vtime_snap_whence = VTIME_SYS;
-	t->vtime_snap = sched_clock();
+	t->vtime_snap = local_clock();
 	write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-15  0:26     ` Frederic Weisbecker
@ 2013-05-15  4:09       ` Mike Galbraith
  2013-05-15 16:05         ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2013-05-15  4:09 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Paul E. McKenney

On Wed, 2013-05-15 at 02:26 +0200, Frederic Weisbecker wrote: 
> On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote:
> > On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: 
> > > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> > > > Greetings,
> > > > 
> > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> > > > accrue zero utime/stime.  However, the same exact kernel on E5620 box
> > > > works fine, so it would appear there's a CPU dependency somewhere.
> > > 
> > > Ah indeed, I just managed to reproduce the same issue.
> > > 
> > > > 
> > > > Is core2 expected to go dysfunctional with context tracking enabled?
> > > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> > > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> > > > boxen only, same exact kernel continues to work just fine on E5620
> > > > (Westmere) box.
> > > 
> > > There was no known issue with core2. The box where I'm seeing the it
> > > is a Phenom quad core that had NR_CPUS=2. May be the issue is more
> > > likely to happen with this low number. I don't know.
> > > 
> > > I'm investigating further.
> > 
> > So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock()
> > (pure tsc) with local_clock()/sched_clock_cpu(cpu).  The former is
> > always quite a bit ahead of the later, so mixing clocks is a nogo on
> > crusty old (but beloved) core2 box.
> 
> Right I have the same issue. So let's use local_clock() everywhere here,
> it takes care of unstable tsc.
> 
> Does the following fix the issue for you?

Yeah, both can use sched_clock_cpu() instead though.

> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index cc2dc3e..1ce322f 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -747,7 +748,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
>  
>  	write_seqlock(&current->vtime_seqlock);
>  	current->vtime_snap_whence = VTIME_SYS;
> -	current->vtime_snap = sched_clock();
> +	current->vtime_snap = local_clock();
>  	write_sequnlock(&current->vtime_seqlock);
>  }
>  
> @@ -757,7 +758,7 @@ void vtime_init_idle(struct task_struct *t)
>  
>  	write_seqlock_irqsave(&t->vtime_seqlock, flags);
>  	t->vtime_snap_whence = VTIME_SYS;
> -	t->vtime_snap = sched_clock();
> +	t->vtime_snap = local_clock();
>  	write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
>  }
>  



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: dynticks: CONFIG_VIRT_CPU_ACCOUNTING +  CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only
  2013-05-15  4:09       ` Mike Galbraith
@ 2013-05-15 16:05         ` Frederic Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2013-05-15 16:05 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Paul E. McKenney

On Wed, May 15, 2013 at 06:09:15AM +0200, Mike Galbraith wrote:
> On Wed, 2013-05-15 at 02:26 +0200, Frederic Weisbecker wrote: 
> > On Tue, May 14, 2013 at 04:07:20PM +0200, Mike Galbraith wrote:
> > > On Tue, 2013-05-14 at 02:57 +0200, Frederic Weisbecker wrote: 
> > > > On Sun, May 12, 2013 at 10:17:49AM +0200, Mike Galbraith wrote:
> > > > > Greetings,
> > > > > 
> > > > > Turning on new NO_HZ feature on my Q6600 box in master, I see that tasks
> > > > > accrue zero utime/stime.  However, the same exact kernel on E5620 box
> > > > > works fine, so it would appear there's a CPU dependency somewhere.
> > > > 
> > > > Ah indeed, I just managed to reproduce the same issue.
> > > > 
> > > > > 
> > > > > Is core2 expected to go dysfunctional with context tracking enabled?
> > > > > CONFIG_VIRT_CPU_ACCOUNTING alone works fine in 3.9-stable, turn on
> > > > > CONFIG_CONTEXT_TRACKING_FORCE, and CPU accounting stops working on core2
> > > > > boxen only, same exact kernel continues to work just fine on E5620
> > > > > (Westmere) box.
> > > > 
> > > > There was no known issue with core2. The box where I'm seeing the it
> > > > is a Phenom quad core that had NR_CPUS=2. May be the issue is more
> > > > likely to happen with this low number. I don't know.
> > > > 
> > > > I'm investigating further.
> > > 
> > > So with CONFIG_HAVE_UNSTABLE_SCHED_CLOCK, you can't mix sched_clock()
> > > (pure tsc) with local_clock()/sched_clock_cpu(cpu).  The former is
> > > always quite a bit ahead of the later, so mixing clocks is a nogo on
> > > crusty old (but beloved) core2 box.
> > 
> > Right I have the same issue. So let's use local_clock() everywhere here,
> > it takes care of unstable tsc.
> > 
> > Does the following fix the issue for you?
> 
> Yeah, both can use sched_clock_cpu() instead though.

Right, given that irqs are already disabled. I'm preparing the patch.

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-15 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-12  8:17 dynticks: CONFIG_VIRT_CPU_ACCOUNTING + CONFIG_CONTEXT_TRACKING breaks accounting on core2 CPUs only Mike Galbraith
2013-05-14  0:57 ` Frederic Weisbecker
2013-05-14  7:37   ` Mike Galbraith
2013-05-14 14:07   ` Mike Galbraith
2013-05-15  0:26     ` Frederic Weisbecker
2013-05-15  4:09       ` Mike Galbraith
2013-05-15 16:05         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox