CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
@ 2007-05-28  1:54 Satoru Takeuchi
  2007-05-28  6:55 ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 8+ messages in thread
From: Satoru Takeuchi @ 2007-05-28  1:54 UTC (permalink / raw)
  To: Linux Kernel
  Cc: Rusty Russell, Srivatsa Vaddagiri, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy,
	Satoru Takeuchi

Hi,

I found a bug on CPU hotplug. If `pfmon --system-wide' is running,
CPU hot remove causes system hang. On the other hand, CPU hot add
during that command seems to work fine.

I detected this problem on my ia64 box, and I don't know that this
problem is also occur on any arch or freezer based CPU hotplug.
Since I'm investigating this problem now, I'm glad if someone
reports the test result on other arch or freezer.

Currently I don't know anything about the cause of this problem.

How To Reproduce
================

Assuming you have two terminal, named term A and term B.

1) execute pfmon from term A

$ pfmon --system-wide
<press ENTER to stop session>

2) issue CPU hot remove from term B

# echo 0 >/sys/devices/system/cpu/cpu1/online

3) Press enter from term B

Expected Result
===============

Succeed to offline CPU

Actual Result
=============

System hang occurs and there are the following messages on serial console.

-------------------------------------------------------------------------------
CPU0                       1925600 CPU_CYCLES
CPU1 could not stop monitoring, CPU may be offline, check results
pfm_write_old_pmds error [3] Device or resource busy
pfmon_read_pmds error Device or resource busy
CPU1 read_results error
-------------------------------------------------------------------------------

Thanks,

Satoru

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-28  1:54 CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide' Satoru Takeuchi
@ 2007-05-28  6:55 ` Srivatsa Vaddagiri
  2007-05-29 20:56   ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-28  6:55 UTC (permalink / raw)
  To: Satoru Takeuchi
  Cc: Linux Kernel, Rusty Russell, Zwane Mwaikambo, Nathan Lynch,
	Joel Schopp, Ashok Raj, Heiko Carstens, Gautham R Shenoy, akpm,
	torvalds, Dipankar

On Mon, May 28, 2007 at 10:54:43AM +0900, Satoru Takeuchi wrote:
> Since I'm investigating this problem now, I'm glad if someone
> reports the test result on other arch or freezer.

I suspect the freezer based approach will not have this problem. Gautham
could probably verify that.

Andrew/Linus,
	So is it settled now on what approach we are going to follow (freezer 
vs lock based) for cpu hotplug? I thought that Linus was not favouring freezer 
based approach sometime back ..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-28  6:55 ` Srivatsa Vaddagiri
@ 2007-05-29 20:56   ` Linus Torvalds
  2007-05-30  2:42     ` Rusty Russell
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Linus Torvalds @ 2007-05-29 20:56 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, akpm, Dipankar



On Mon, 28 May 2007, Srivatsa Vaddagiri wrote:
>
> 	So is it settled now on what approach we are going to follow (freezer 
> vs lock based) for cpu hotplug? I thought that Linus was not favouring freezer 
> based approach sometime back ..

As far as I'm concerned, we should
 - use "preempt_disable()" to protect against CPU's coming and going 
 - use "stop_machine()" or similar that already honors preemption, and 
   which I trust a whole lot  more than freezer.
 - .. especially since this is already how we are supposed to be protected 
   against CPU's going away, and we've already started doing that (for an 
   example of this, see things like e18f3ffb9c from Andrew)

It really does seem fairly straightforward to make "__cpu_up()" be called 
through stop_machine too. Looking at _cpu_down:

        mutex_lock(&cpu_bitmask_lock);
        p = __stop_machine_run(take_cpu_down, NULL, cpu);
        mutex_unlock(&cpu_bitmask_lock);

and then looking at _cpu_up:

        mutex_lock(&cpu_bitmask_lock);
        ret = __cpu_up(cpu);
        mutex_unlock(&cpu_bitmask_lock);

I just go "Aww, wouldn't it be nice to just make that "__cpu_up()" call be 
done through __stop_machine_run() too?"

Hmm?

Then, you could get the "cpu_bitmask_lock" if you need to sleep, but if 
you don't want to do that (and quite often you don't), just doing a 
"preempt_disable()" or taking a spinlock will *also* guarantee that no new 
CPU's suddenly show up, so it's safe to look at the CPU online bitmasks.

Do we really need anything else?

As mentioned, it's actually fairly easy to add verification calls to make 
sure that certain accesses are done with preemption disabled, so..

			Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-29 20:56   ` Linus Torvalds
@ 2007-05-30  2:42     ` Rusty Russell
  2007-05-30 16:55     ` Srivatsa Vaddagiri
  2007-06-06 15:24     ` Gautham R Shenoy
  2 siblings, 0 replies; 8+ messages in thread
From: Rusty Russell @ 2007-05-30  2:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srivatsa Vaddagiri, Satoru Takeuchi, Linux Kernel,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, Gautham R Shenoy, akpm, Dipankar

On Tue, 2007-05-29 at 13:56 -0700, Linus Torvalds wrote:
> 
> On Mon, 28 May 2007, Srivatsa Vaddagiri wrote:
> >
> > 	So is it settled now on what approach we are going to follow (freezer 
> > vs lock based) for cpu hotplug? I thought that Linus was not favouring freezer 
> > based approach sometime back ..
> 
> As far as I'm concerned, we should
>  - use "preempt_disable()" to protect against CPU's coming and going 
>  - use "stop_machine()" or similar that already honors preemption, and 
>    which I trust a whole lot  more than freezer.
>  - .. especially since this is already how we are supposed to be protected 
>    against CPU's going away, and we've already started doing that (for an 
>    example of this, see things like e18f3ffb9c from Andrew)

Indeed, this is how it was supposed to work.

	Note that it is possible to make stop_machine() an even larger hammer,
by scheduler mods to flush all the preempted tasks.  This would drop the
requirement for preempt_disable().

But cute as that would be, I've been waiting until someone demonstrates
an actual need...

Rusty.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-29 20:56   ` Linus Torvalds
  2007-05-30  2:42     ` Rusty Russell
@ 2007-05-30 16:55     ` Srivatsa Vaddagiri
  2007-05-30 17:03       ` Linus Torvalds
  2007-06-06 15:24     ` Gautham R Shenoy
  2 siblings, 1 reply; 8+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-30 16:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, akpm, Dipankar

On Tue, May 29, 2007 at 01:56:24PM -0700, Linus Torvalds wrote:
> As far as I'm concerned, we should
>  - use "preempt_disable()" to protect against CPU's coming and going 
>  - use "stop_machine()" or similar that already honors preemption, and 
>    which I trust a whole lot  more than freezer.
>  - .. especially since this is already how we are supposed to be protected 
>    against CPU's going away, and we've already started doing that (for an 
>    example of this, see things like e18f3ffb9c from Andrew)
> 
> It really does seem fairly straightforward to make "__cpu_up()" be called 
> through stop_machine too. Looking at _cpu_down:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         p = __stop_machine_run(take_cpu_down, NULL, cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> and then looking at _cpu_up:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         ret = __cpu_up(cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> I just go "Aww, wouldn't it be nice to just make that "__cpu_up()" call be 
> done through __stop_machine_run() too?"
> 
> Hmm?
> 
> Then, you could get the "cpu_bitmask_lock" if you need to sleep,

and that's where all the problems started - sleepers needing to take that mutex 
recursively (which we did/do not support).

	foo() takes cpu_bitmask_lock and calls
	  foo_bar() which also needs cpu_bitmask_lock

What is a solution to that?

	- Forget (hide?) this whole locking mess by using freezer, which
	  is what Andrew wanted us to shoot for :) I am somewhat biased
	  with Andrew here in that I think it will lead to more stabler cpu 
	  hotplug code over time. Again I know some people will beg to differ 
	  on this view.

	- extend mutexes to support recursion (which I gather Linux has 
	  religiously avoided so far)

	- invent a special lock for cpu hotplug which supports recursion. 
	  This is what Gautham tried doing with [1], with the bonus that it 
	  made the lock extremely scalable for readers by using per-cpu 
	  reference counters and RCU. He is preparing to resend those patches 
	  against latest kernel atm

	- Anything else you can think of?

[1] http://lkml.org/lkml/2006/10/26/73


> but if you don't want to do that (and quite often you don't), just doing a 
> "preempt_disable()" or taking a spinlock will *also* guarantee that no new 
> CPU's suddenly show up, so it's safe to look at the CPU online bitmasks.
> 
> Do we really need anything else?

see above

> As mentioned, it's actually fairly easy to add verification calls to make 
> sure that certain accesses are done with preemption disabled, so..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-30 16:55     ` Srivatsa Vaddagiri
@ 2007-05-30 17:03       ` Linus Torvalds
  2007-05-31 16:51         ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2007-05-30 17:03 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, akpm, Dipankar

On Wed, 30 May 2007, Srivatsa Vaddagiri wrote:
> 
> and that's where all the problems started - sleepers needing to take that mutex 
> recursively (which we did/do not support).
> 
> 	foo() takes cpu_bitmask_lock and calls
> 	  foo_bar() which also needs cpu_bitmask_lock
> 
> What is a solution to that?

The solution is to not have crap locking in the first place.

Or, if you absolutely have to, support recursive locking. But the bassic 
rule should always really be that the need for recursive locking is really 
a sign of a locking bug in the first place.

So what's so wrong with just admitting that the locking was crap, and 
doing it properly? The _proper_ locking doesn't need recursive locks, it 
just takes the lock at the outer layers, and then does *not* take it 
internally, because the called routines are called with the lock held and 
know they are safe.

We have been able to do that in _every_ single kernel subsystem. What's so 
special about cpufreq? NOTHING. Except that the code is sometimes horribly 
bad.

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-30 17:03       ` Linus Torvalds
@ 2007-05-31 16:51         ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 8+ messages in thread
From: Srivatsa Vaddagiri @ 2007-05-31 16:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Satoru Takeuchi, Linux Kernel, Rusty Russell, Zwane Mwaikambo,
	Nathan Lynch, Joel Schopp, Ashok Raj, Heiko Carstens,
	Gautham R Shenoy, akpm, Dipankar

On Wed, May 30, 2007 at 10:03:01AM -0700, Linus Torvalds wrote:
> > and that's where all the problems started - sleepers needing to take that mutex 
> > recursively (which we did/do not support).
> > 
> > 	foo() takes cpu_bitmask_lock and calls
> > 	  foo_bar() which also needs cpu_bitmask_lock
> > 
> > What is a solution to that?
> 
> The solution is to not have crap locking in the first place.

I wish it was that simple wrt cpu hotplug lock :)

It was not just cpufreq which tripped on this locking mess. There was
flush_workqueue and also more recently preempt version of rcu.

	flush_workqueue()
		mutex_lock(&hotplug_lock);	/* we can sleep below */
		for_each_online_cpu()
			flush_cpu_workqueue();
		mutex_unlock(&hotplug_lock);

First problem with this snippet was that a workqueue function for which
we are waiting to be flushed may want to take the same hotplug_lock,
which can deadlock (see http://lkml.org/lkml/2006/12/6/352 for
description of this).

Second problem was : keventd is running flush_workqueue() and one of the
work functions calls flush_workqueue() again which is an attempt to take
hotplug_lock() recursively.

Partly the situation is complex because of the intertwining of
so many subsystem around this one lock. In all cases, encoding two version of
a function, one which takes the hotplug_lock and the other which doesnt take,
is probably an option. IMHO that would just make the source too ugly and
difficult to audit for cpu hotplug safety.

Best would be to go for a lock which supports recursion (and which
avoids other problems inherent with the old mutex based lock) or as
Andrew insisted with the freezer.

> Or, if you absolutely have to, support recursive locking. But the bassic 
> rule should always really be that the need for recursive locking is really 
> a sign of a locking bug in the first place.
> 
> So what's so wrong with just admitting that the locking was crap, and 
> doing it properly? The _proper_ locking doesn't need recursive locks, it 
> just takes the lock at the outer layers, and then does *not* take it 
> internally, because the called routines are called with the lock held and 
> know they are safe.
> 
> We have been able to do that in _every_ single kernel subsystem. What's so 
> special about cpufreq? NOTHING. Except that the code is sometimes horribly 
> bad.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide'
  2007-05-29 20:56   ` Linus Torvalds
  2007-05-30  2:42     ` Rusty Russell
  2007-05-30 16:55     ` Srivatsa Vaddagiri
@ 2007-06-06 15:24     ` Gautham R Shenoy
  2 siblings, 0 replies; 8+ messages in thread
From: Gautham R Shenoy @ 2007-06-06 15:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srivatsa Vaddagiri, Satoru Takeuchi, Linux Kernel, Rusty Russell,
	Zwane Mwaikambo, Nathan Lynch, Joel Schopp, Ashok Raj,
	Heiko Carstens, akpm, Dipankar

On Tue, May 29, 2007 at 01:56:24PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 28 May 2007, Srivatsa Vaddagiri wrote:
> >
> > 	So is it settled now on what approach we are going to follow (freezer 
> > vs lock based) for cpu hotplug? I thought that Linus was not favouring freezer 
> > based approach sometime back ..
> 
> As far as I'm concerned, we should
>  - use "preempt_disable()" to protect against CPU's coming and going 
>  - use "stop_machine()" or similar that already honors preemption, and 
>    which I trust a whole lot  more than freezer.
>  - .. especially since this is already how we are supposed to be protected 
>    against CPU's going away, and we've already started doing that (for an 
>    example of this, see things like e18f3ffb9c from Andrew)
> 

Yes, provided that the code sections which need protection against CPU's 
coming and going don't block, we surely can use preempt_disable/preempt_enable
for refcounting. But we do have scenarios where such code sections do
block. Vatsa has quoted a few of them in his mail.

> It really does seem fairly straightforward to make "__cpu_up()" be called 
> through stop_machine too. Looking at _cpu_down:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         p = __stop_machine_run(take_cpu_down, NULL, cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> and then looking at _cpu_up:
> 
>         mutex_lock(&cpu_bitmask_lock);
>         ret = __cpu_up(cpu);
>         mutex_unlock(&cpu_bitmask_lock);
> 
> I just go "Aww, wouldn't it be nice to just make that "__cpu_up()" call be 
> done through __stop_machine_run() too?"
> 

Sure, we can do it. But what is it going to solve?

The whole section from CPU_UP/DOWN_PREPARE to CPU_ONLINE/DEAD
is supposed to be atomic and not just __cpu_up/take_cpu_down.
These are the sections where subsystems create/destroy their per-cpu
resources.

Remember, the cpufreq problem was originally caused because we were 
releasing the cpu_bitmask_lock before handling CPU_DEAD, where some of
the cpufreq specific per-cpu data was being freed. And thus, we were
operating on stale data in the window between the release of 
cpu_bitmask_lock and handling of CPU_DEAD.

> Hmm?
> 
> Then, you could get the "cpu_bitmask_lock" if you need to sleep, but if 
> you don't want to do that (and quite often you don't), just doing a 
> "preempt_disable()" or taking a spinlock will *also* guarantee that no new 
> CPU's suddenly show up, so it's safe to look at the CPU online bitmasks.
> 
> Do we really need anything else?
> 

* We don't need locks/mutexes because (bad) experience tells us that
  in this case, locking is not the right model. 

* Despite having implemented it, I am not very much convinced about
  freezer because it hides the cpu-hotplug details from subsystems, which
  IMHO is not a good thing. 

* Like every other resource, if people don't want a cpu to go down/come up, 
  they should bump up an associated refcount. But since we need
  this refcounting model to allow blocking code sections too, we cannot
  use preempt_enable/disable. 

Therefore sir, we do need nice scalable refcounting model :)

> As mentioned, it's actually fairly easy to add verification calls to make 
> sure that certain accesses are done with preemption disabled, so..
> 
> 			Linus

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-06-06 15:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-28  1:54 CPU hotplug: system hang on CPU hot remove during `pfmon --system-wide' Satoru Takeuchi
2007-05-28  6:55 ` Srivatsa Vaddagiri
2007-05-29 20:56   ` Linus Torvalds
2007-05-30  2:42     ` Rusty Russell
2007-05-30 16:55     ` Srivatsa Vaddagiri
2007-05-30 17:03       ` Linus Torvalds
2007-05-31 16:51         ` Srivatsa Vaddagiri
2007-06-06 15:24     ` Gautham R Shenoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox