public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* idle issues running sembench on 128 cpus
@ 2011-05-04 21:47 Dave Kleikamp
  2011-05-04 22:04 ` Thomas Gleixner
  2011-05-04 22:07 ` Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Kleikamp @ 2011-05-04 21:47 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Chris Mason, Peter Zijlstra, Tim Chen, linux-kernel

Thomas,
I've been looking at performance running sembench on a 128-cpu system 
and I'm running into some issues in the idle loop.

Initially, I was seeing a lot of contention on the clockevents_lock in 
clockevents_notify(). Assuming it is only protecting clockevents_chain, 
and not the handlers themselves, I changed this to an rwlock (with 
thoughts of using rcu if successful).

This didn't help, but exposed an underlying problem with high contention 
on tick_broadcast_lock in tick_broadcast_oneshot_control(). I think with 
this many cpus, tick_handle_oneshot_broadcast() is holding that lock a 
long time, causing the idle cpus to spin on the lock.

I am able to avoid this problem with either kernel parameter, 
"idle=mwait" or "processor.max_cstate=1". Similarly, defining 
CONFIG_INTEL_IDLE=y and using the kernel parameter 
intel_idle.max_cstate=1 exposes a different spinlock, pm_qos_lock, but I 
found this patch which fixes that contention:
https://lists.linux-foundation.org/pipermail/linux-pm/2011-February/030266.html
https://patchwork.kernel.org/patch/550721/

Of course, we'd like to find a way to reduce the spinlock contention and 
not resort to prohibiting the cpus from entering C3 state at all. I 
don't see a simple fix, and want to know if you've seen anything like 
this before and given it any thought.

I also don't know if it makes sense to be able to tune the cpuidle 
governors to add more resistance to enter the C3 state, or even being 
able to switch to a performance governor at runtime, similar to cpufreq.

I'd like to hear your thoughts before I dive any deeper into this.

Thanks,
Shaggy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 21:47 idle issues running sembench on 128 cpus Dave Kleikamp
@ 2011-05-04 22:04 ` Thomas Gleixner
  2011-05-04 22:07 ` Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-04 22:04 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Chris Mason, Peter Zijlstra, Tim Chen, linux-kernel

Dave,

On Wed, 4 May 2011, Dave Kleikamp wrote:

> Thomas,
> I've been looking at performance running sembench on a 128-cpu system and I'm
> running into some issues in the idle loop.
> 
> Initially, I was seeing a lot of contention on the clockevents_lock in
> clockevents_notify(). Assuming it is only protecting clockevents_chain, and
> not the handlers themselves, I changed this to an rwlock (with thoughts of
> using rcu if successful).
> 
> This didn't help, but exposed an underlying problem with high contention on
> tick_broadcast_lock in tick_broadcast_oneshot_control(). I think with this
> many cpus, tick_handle_oneshot_broadcast() is holding that lock a long time,
> causing the idle cpus to spin on the lock.
> 
> I am able to avoid this problem with either kernel parameter, "idle=mwait" or
> "processor.max_cstate=1". Similarly, defining CONFIG_INTEL_IDLE=y and using
> the kernel parameter intel_idle.max_cstate=1 exposes a different spinlock,
> pm_qos_lock, but I found this patch which fixes that contention:
> https://lists.linux-foundation.org/pipermail/linux-pm/2011-February/030266.html
> https://patchwork.kernel.org/patch/550721/
> 
> Of course, we'd like to find a way to reduce the spinlock contention and not
> resort to prohibiting the cpus from entering C3 state at all. I don't see a
> simple fix, and want to know if you've seen anything like this before and
> given it any thought.
> 
> I also don't know if it makes sense to be able to tune the cpuidle governors
> to add more resistance to enter the C3 state, or even being able to switch to
> a performance governor at runtime, similar to cpufreq.
> 
> I'd like to hear your thoughts before I dive any deeper into this.

Tick broadcasting for more than a few CPU's simply does not scale and
never will.

There is no real way to avoid the global lock if all what you have is
_ONE_ global working event device and N cpus which try to work around
their f*cked up local apics when deeper C-States are entered.

The same problem is with the TSC which stops in deeper C-States. You
just don't see lock contention because we rely on the HW serialization
of HPET or PM_TIMER which is a huge bottleneck when you try to do
timekeeping related stuff high frequency on more than a handful of
cores at the same time. Just benchmark a tight loop of gettimeofday()
or clock_gettime() on such a machine with and without max_cstate=1 on
the kernel command line.

We could perhaps get away w/o the locking for the NOHZ=n and HIGHRES=n
case, but I doubt that you want to have that given that you don't want
to restrict C-States either. C-states do not make much sense without
NOHZ=y at least.

We tried to beat sense into unnamed HW manufacturers for years and it
took just a little bit more than a decade that they started to act on
it :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 21:47 idle issues running sembench on 128 cpus Dave Kleikamp
  2011-05-04 22:04 ` Thomas Gleixner
@ 2011-05-04 22:07 ` Andi Kleen
  2011-05-04 22:34   ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2011-05-04 22:07 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Thomas Gleixner, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

Dave Kleikamp <dkleikamp@gmail.com> writes:
>
> I am able to avoid this problem with either kernel parameter,
> "idle=mwait" or "processor.max_cstate=1". Similarly, defining
> CONFIG_INTEL_IDLE=y and using the kernel parameter
> intel_idle.max_cstate=1 exposes a different spinlock, pm_qos_lock, but
> I found this patch which fixes that contention:
> https://lists.linux-foundation.org/pipermail/linux-pm/2011-February/030266.html
> https://patchwork.kernel.org/patch/550721/

The pm_qos patch really needs to be merged ASAP. Len?

> Of course, we'd like to find a way to reduce the spinlock contention
> and not resort to prohibiting the cpus from entering C3 state at
> all. I don't see a simple fix, and want to know if you've seen
> anything like this before and given it any thought.
>
> I also don't know if it makes sense to be able to tune the cpuidle
> governors to add more resistance to enter the C3 state, or even being
> able to switch to a performance governor at runtime, similar to
> cpufreq.
>
> I'd like to hear your thoughts before I dive any deeper into this.

It's fixed on Westmere. There the APIC timer will always tick
and all that logic is not needed anymore and disabled.

That is mostly fixed. One problem right now is that the
CLOCK_EVT_FEAT_C3STOP test is inside the lock. But we
can easily move it out, assuming the clock_event_device
gets RCU freed or has a reference count.

But yes it would be still good to fix Nehalem too.

One fix would be to make all the masks hierarchical,
similar to what RCU does. Perhaps even some code 
could be shared with RCU on that because it's a very
similar problem.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 22:07 ` Andi Kleen
@ 2011-05-04 22:34   ` Thomas Gleixner
  2011-05-04 23:03     ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-04 22:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Wed, 4 May 2011, Andi Kleen wrote:
> Dave Kleikamp <dkleikamp@gmail.com> writes:
> > I also don't know if it makes sense to be able to tune the cpuidle
> > governors to add more resistance to enter the C3 state, or even being
> > able to switch to a performance governor at runtime, similar to
> > cpufreq.
> >
> > I'd like to hear your thoughts before I dive any deeper into this.
> 
> It's fixed on Westmere. There the APIC timer will always tick
> and all that logic is not needed anymore and disabled.
> 
> That is mostly fixed. One problem right now is that the
> CLOCK_EVT_FEAT_C3STOP test is inside the lock. But we
> can easily move it out, assuming the clock_event_device
> gets RCU freed or has a reference count.

No, it does not even need refcounting. We can access it outside of the
lock as this is atomic context called on the cpu which is about to go
idle and therefor the device cannot go away. Easy and straightforward
fix.
 
> But yes it would be still good to fix Nehalem too.
> 
> One fix would be to make all the masks hierarchical,
> similar to what RCU does. Perhaps even some code 
> could be shared with RCU on that because it's a very
> similar problem.

In theory. It's not about the mask. The mask is uninteresting. It's
about the expiry time, which we have to protect. There is nothing
hierarchical about that. It all boils down on _ONE_ single functional
device and you don't want to miss out your deadline just because you
decided to be extra clever. RCU does not care much whether you run the
callbacks a tick later on not. Time and timekeeping does.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 22:34   ` Thomas Gleixner
@ 2011-05-04 23:03     ` Andi Kleen
  2011-05-04 23:29       ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2011-05-04 23:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

> No, it does not even need refcounting. We can access it outside of the

Ok.

> lock as this is atomic context called on the cpu which is about to go
> idle and therefor the device cannot go away. Easy and straightforward
> fix.

Ok. Patch appended. Looks good?

BTW why must the lock be irqsave?

>  
> > But yes it would be still good to fix Nehalem too.
> > 
> > One fix would be to make all the masks hierarchical,
> > similar to what RCU does. Perhaps even some code 
> > could be shared with RCU on that because it's a very
> > similar problem.
> 
> In theory. It's not about the mask. The mask is uninteresting. It's
> about the expiry time, which we have to protect. There is nothing
> hierarchical about that. It all boils down on _ONE_ single functional

The mask can be used to see if another thread on this core is still
running. If yes you don't need that. Right now Linux doesn't 
know that, but it could be taught. The only problem is that once
the other guy goes idle too their timeouts have to be merged.

This would cut contention in half.

Also if it's HPET you could actually use multiple independent HPET channels.
I remember us discussing this a long time ago... Not sure if it's worth
it, but it may be a small relief.

> device and you don't want to miss out your deadline just because you
> decided to be extra clever. RCU does not care much whether you run the
> callbacks a tick later on not. Time and timekeeping does.

You can at least check lockless if someone else has a <= timeout, right?

-Andi

---

Move C3 stop test outside lock

Avoid taking locks in the idle path for systems where the timer
doesn't stop in C3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index da800ff..9cf0415 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	unsigned long flags;
 	int cpu;
 
-	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
-
 	/*
 	 * Periodic mode does not care about the enter/exit of power
 	 * states
 	 */
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
-		goto out;
+		return;
 
+	cpu = raw_smp_processor_id();
 	bc = tick_broadcast_device.evtdev;
-	cpu = smp_processor_id();
 	td = &per_cpu(tick_cpu_device, cpu);
 	dev = td->evtdev;
 
 	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
-		goto out;
+		return;
 
+	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
 		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
 			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:03     ` Andi Kleen
@ 2011-05-04 23:29       ` Thomas Gleixner
  2011-05-04 23:42         ` Andi Kleen
  2011-05-05 13:58         ` idle issues running sembench on 128 cpus Thomas Gleixner
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-04 23:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Thu, 5 May 2011, Andi Kleen wrote:
> > No, it does not even need refcounting. We can access it outside of the
> 
> Ok.
> 
> > lock as this is atomic context called on the cpu which is about to go
> > idle and therefor the device cannot go away. Easy and straightforward
> > fix.
> 
> Ok. Patch appended. Looks good?

Mostly. See below.
 
> BTW why must the lock be irqsave?

Good question. Probably safety frist paranoia :)

Indeed that code should only be called from irq disabled regions, so
we could avoid the irqsave there. Otherwise that needs to be irqsave
for obvious reasons.

> > > But yes it would be still good to fix Nehalem too.
> > > 
> > > One fix would be to make all the masks hierarchical,
> > > similar to what RCU does. Perhaps even some code 
> > > could be shared with RCU on that because it's a very
> > > similar problem.
> > 
> > In theory. It's not about the mask. The mask is uninteresting. It's
> > about the expiry time, which we have to protect. There is nothing
> > hierarchical about that. It all boils down on _ONE_ single functional
> 
> The mask can be used to see if another thread on this core is still
> running. If yes you don't need that. Right now Linux doesn't 
> know that, but it could be taught. The only problem is that once
> the other guy goes idle too their timeouts have to be merged.
> 
> This would cut contention in half.

That makes sense, but merging the timeouts race free will be a real
PITA.

> Also if it's HPET you could actually use multiple independent HPET channels.
> I remember us discussing this a long time ago... Not sure if it's worth
> it, but it may be a small relief.

Multiple broadcast devices. That sounds still horrible :)
 
> > device and you don't want to miss out your deadline just because you
> > decided to be extra clever. RCU does not care much whether you run the
> > callbacks a tick later on not. Time and timekeeping does.
> 
> You can at least check lockless if someone else has a <= timeout, right?

Might be worth a try. Need some sleep to remember why I discarded that
idea long ago.

> -Andi
> 
> ---
> 
> Move C3 stop test outside lock
> 
> Avoid taking locks in the idle path for systems where the timer
> doesn't stop in C3.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index da800ff..9cf0415 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
>  	unsigned long flags;
>  	int cpu;
>  
> -	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
> -
>  	/*
>  	 * Periodic mode does not care about the enter/exit of power
>  	 * states
>  	 */
>  	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> -		goto out;
> +		return;
>  
> +	cpu = raw_smp_processor_id();

Why raw_ ? As I said above this should always be called with irqs
disabled.

If that ever gets called from an irq enabled, preemptible and
migratable context then we just open up a very narrow but ugly to
debug race window as we can look at the wrong per cpu device.

>  	bc = tick_broadcast_device.evtdev;
> -	cpu = smp_processor_id();
>  	td = &per_cpu(tick_cpu_device, cpu);
>  	dev = td->evtdev;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:29       ` Thomas Gleixner
@ 2011-05-04 23:42         ` Andi Kleen
  2011-05-04 23:47           ` Thomas Gleixner
  2011-05-04 23:48           ` idle issues running sembench on 128 cpus II Andi Kleen
  2011-05-05 13:58         ` idle issues running sembench on 128 cpus Thomas Gleixner
  1 sibling, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2011-05-04 23:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Thu, May 05, 2011 at 01:29:49AM +0200, Thomas Gleixner wrote:
> > > hierarchical about that. It all boils down on _ONE_ single functional
> > 
> > The mask can be used to see if another thread on this core is still
> > running. If yes you don't need that. Right now Linux doesn't 
> > know that, but it could be taught. The only problem is that once
> > the other guy goes idle too their timeouts have to be merged.
> > 
> > This would cut contention in half.
> 
> That makes sense, but merging the timeouts race free will be a real
> PITA.

For this case one could actually use a spinlock between the siblings.
That shouldn't be a problem as long as it's not a global spinlock.

> 
> > Also if it's HPET you could actually use multiple independent HPET channels.
> > I remember us discussing this a long time ago... Not sure if it's worth
> > it, but it may be a small relief.
> 
> Multiple broadcast devices. That sounds still horrible :)


It would cut contention in half or more at least. Not great,
but sometimes you take everything you can get.

> Might be worth a try. Need some sleep to remember why I discarded that
> idea long ago.

Ok.

Here's a new patch without the raw. Boots on my Westmere.

---

From: Andi Kleen <ak@linux.intel.com>
Subject: [PATCH] Move C3 stop test outside lock

Avoid taking locks in the idle path for systems where the timer
doesn't stop in C3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index da800ff..9cf0415 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	unsigned long flags;
 	int cpu;
 
-	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
-
 	/*
 	 * Periodic mode does not care about the enter/exit of power
 	 * states
 	 */
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
-		goto out;
+		return;
 
+	cpu = raw_smp_processor_id();
 	bc = tick_broadcast_device.evtdev;
-	cpu = smp_processor_id();
 	td = &per_cpu(tick_cpu_device, cpu);
 	dev = td->evtdev;
 
 	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
-		goto out;
+		return;
 
+	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
 		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
 			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:42         ` Andi Kleen
@ 2011-05-04 23:47           ` Thomas Gleixner
  2011-05-04 23:49             ` Andi Kleen
  2011-05-04 23:48           ` idle issues running sembench on 128 cpus II Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-04 23:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Thu, 5 May 2011, Andi Kleen wrote:
> On Thu, May 05, 2011 at 01:29:49AM +0200, Thomas Gleixner wrote:
> > That makes sense, but merging the timeouts race free will be a real
> > PITA.
> 
> For this case one could actually use a spinlock between the siblings.
> That shouldn't be a problem as long as it's not a global spinlock.

Care to give it a try ?

> > > Also if it's HPET you could actually use multiple independent HPET channels.
> > > I remember us discussing this a long time ago... Not sure if it's worth
> > > it, but it may be a small relief.
> > 
> > Multiple broadcast devices. That sounds still horrible :)
> 
> 
> It would cut contention in half or more at least. Not great,
> but sometimes you take everything you can get.

To a certain degree. If the code pain is larger than the benefit ...
 
> Here's a new patch without the raw. Boots on my Westmere.

> +	cpu = raw_smp_processor_id();

Hmm. quilt refresh perhaps ? I know that feeling :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus II
  2011-05-04 23:42         ` Andi Kleen
  2011-05-04 23:47           ` Thomas Gleixner
@ 2011-05-04 23:48           ` Andi Kleen
  2011-05-05 15:24             ` Dave Kleikamp
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2011-05-04 23:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Dave Kleikamp, Chris Mason, Peter Zijlstra,
	Tim Chen, linux-kernel, lenb, paulmck

> Here's a new patch without the raw. Boots on my Westmere.

Sorry appended the old patch. Here's really the new version.

From: Andi Kleen <ak@linux.intel.com>
Date: Wed, 4 May 2011 15:09:27 -0700
Subject: [PATCH] Move C3 stop test outside lock

Avoid taking locks in the idle path for systems where the timer
doesn't stop in C3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index da800ff..7f565b7 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	unsigned long flags;
 	int cpu;
 
-	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
-
 	/*
 	 * Periodic mode does not care about the enter/exit of power
 	 * states
 	 */
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
-		goto out;
+		return;
 
-	bc = tick_broadcast_device.evtdev;
 	cpu = smp_processor_id();
+	bc = tick_broadcast_device.evtdev;
 	td = &per_cpu(tick_cpu_device, cpu);
 	dev = td->evtdev;
 
 	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
-		goto out;
+		return;
 
+	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
 		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
 			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:47           ` Thomas Gleixner
@ 2011-05-04 23:49             ` Andi Kleen
  2011-05-04 23:51               ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2011-05-04 23:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

Thomas Gleixner <tglx@linutronix.de> writes:

> On Thu, 5 May 2011, Andi Kleen wrote:
>> On Thu, May 05, 2011 at 01:29:49AM +0200, Thomas Gleixner wrote:
>> > That makes sense, but merging the timeouts race free will be a real
>> > PITA.
>> 
>> For this case one could actually use a spinlock between the siblings.
>> That shouldn't be a problem as long as it's not a global spinlock.
>
> Care to give it a try ?

Ok, will try tomorrow.

>> Here's a new patch without the raw. Boots on my Westmere.
>
>> +	cpu = raw_smp_processor_id();
>
> Hmm. quilt refresh perhaps ? I know that feeling :)

The scp to copy the patch was too slow. Noticed it later and sent a new
one.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:49             ` Andi Kleen
@ 2011-05-04 23:51               ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-04 23:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Wed, 4 May 2011, Andi Kleen wrote:

> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> > On Thu, 5 May 2011, Andi Kleen wrote:
> >> On Thu, May 05, 2011 at 01:29:49AM +0200, Thomas Gleixner wrote:
> >> > That makes sense, but merging the timeouts race free will be a real
> >> > PITA.
> >> 
> >> For this case one could actually use a spinlock between the siblings.
> >> That shouldn't be a problem as long as it's not a global spinlock.
> >
> > Care to give it a try ?
> 
> Ok, will try tomorrow.
> 
> >> Here's a new patch without the raw. Boots on my Westmere.
> >
> >> +	cpu = raw_smp_processor_id();
> >
> > Hmm. quilt refresh perhaps ? I know that feeling :)
> 
> The scp to copy the patch was too slow. Noticed it later and sent a new
> one.

NP. Happens to hit everybody from time to time :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus
  2011-05-04 23:29       ` Thomas Gleixner
  2011-05-04 23:42         ` Andi Kleen
@ 2011-05-05 13:58         ` Thomas Gleixner
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2011-05-05 13:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Kleikamp, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On Thu, 5 May 2011, Thomas Gleixner wrote:
> On Thu, 5 May 2011, Andi Kleen wrote:
> > > No, it does not even need refcounting. We can access it outside of the
> > 
> > Ok.
> > 
> > > lock as this is atomic context called on the cpu which is about to go
> > > idle and therefor the device cannot go away. Easy and straightforward
> > > fix.
> > 
> > Ok. Patch appended. Looks good?
> 
> Mostly. See below.
>  
> > BTW why must the lock be irqsave?
> 
> Good question. Probably safety frist paranoia :)
> 
> Indeed that code should only be called from irq disabled regions, so
> we could avoid the irqsave there. Otherwise that needs to be irqsave
> for obvious reasons.

Just looked through all the call sites. Both intel_idle and
processor_idle notify ENTER with interrups disabled, but EXIT with
interrupts enabled. So when we want to remove irqsave from the
spinlock that needs to be fixed as well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: idle issues running sembench on 128 cpus II
  2011-05-04 23:48           ` idle issues running sembench on 128 cpus II Andi Kleen
@ 2011-05-05 15:24             ` Dave Kleikamp
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Kleikamp @ 2011-05-05 15:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Chris Mason, Peter Zijlstra, Tim Chen,
	linux-kernel, lenb, paulmck

On 05/04/2011 06:48 PM, Andi Kleen wrote:
>> Here's a new patch without the raw. Boots on my Westmere.
>
> Sorry appended the old patch. Here's really the new version.
>
> From: Andi Kleen<ak@linux.intel.com>
> Date: Wed, 4 May 2011 15:09:27 -0700
> Subject: [PATCH] Move C3 stop test outside lock
>
> Avoid taking locks in the idle path for systems where the timer
> doesn't stop in C3.
>
> Signed-off-by: Andi Kleen<ak@linux.intel.com>
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index da800ff..7f565b7 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
>   	unsigned long flags;
>   	int cpu;
>
> -	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
> -
>   	/*
>   	 * Periodic mode does not care about the enter/exit of power
>   	 * states
>   	 */
>   	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> -		goto out;
> +		return;
>
> -	bc = tick_broadcast_device.evtdev;
>   	cpu = smp_processor_id();
> +	bc = tick_broadcast_device.evtdev;
>   	td =&per_cpu(tick_cpu_device, cpu);
>   	dev = td->evtdev;
>
>   	if (!(dev->features&  CLOCK_EVT_FEAT_C3STOP))
> -		goto out;
> +		return;

out: is now defined but not used in this function.

>
> +	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
>   	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
>   		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
>   			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-05 15:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-04 21:47 idle issues running sembench on 128 cpus Dave Kleikamp
2011-05-04 22:04 ` Thomas Gleixner
2011-05-04 22:07 ` Andi Kleen
2011-05-04 22:34   ` Thomas Gleixner
2011-05-04 23:03     ` Andi Kleen
2011-05-04 23:29       ` Thomas Gleixner
2011-05-04 23:42         ` Andi Kleen
2011-05-04 23:47           ` Thomas Gleixner
2011-05-04 23:49             ` Andi Kleen
2011-05-04 23:51               ` Thomas Gleixner
2011-05-04 23:48           ` idle issues running sembench on 128 cpus II Andi Kleen
2011-05-05 15:24             ` Dave Kleikamp
2011-05-05 13:58         ` idle issues running sembench on 128 cpus Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox