public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* CPU offline/online related hang with latest git
@ 2009-12-20 10:04 Sachin Sant
  2009-12-20 13:31 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Sachin Sant @ 2009-12-20 10:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

With 2.6.33-rc1-git1 (dd59f6c..) i am having trouble with
CPU hotplug on multiple architectures. Trying to offline
a CPU results in a machine hang.

Sample o/p from a 4 way x86_64 box :

#:/sys/devices/system/cpu/cpu3 # cat online
1
#:/sys/devices/system/cpu/cpu3 # echo 0 > online
Stack:
Call Trace:
 <IRQ>
 <EOI>
Code: 45 f0 48 89 45 b8 48 8d 45 d0 4c 89 4d f8 c7 45 b0 10 00 00 00 48 89 45 c0 e8 5a ff ff ff c9 c3 89 f0 b9 40 00 00 00 55 99 f7 f9 <48> 89 e5 48 89 f9 48 83 ec 08 31 d2 41 89 c0 eb 12 48 8b 01 48
Stack:
Call Trace:
Code: 20 00 00 00 00 48 89 3e 49 8d 74 24 28 e8 45 75 1d 00 49 c7 44 24 50 00 00 00 00 48 8b 9b 60 01 00 00 48 85 db 0f 85 25 ff ff ff <5b> 41 5c c9 c3 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 81 ec
Stack:
Call Trace:
Code: 48 0f 45 d8 eb 1a 44 0f a3 20 19 c0 48 c7 c3 10 b5 a0 81 85 c0 48 c7 c0 30 b5 a0 81 48 0f 44 d8 31 c0 f3 90 44 8b 25 82 35 96 00 <41> 39 c4 74 46 41 83 fc 02 74 08 41 83 fc 03 75 12 eb 03 fa eb
Stack:
Call Trace:
Code: 48 c7 c0 30 b5 a0 81 48 0f 45 d8 eb 1a 44 0f a3 20 19 c0 48 c7 c3 10 b5 a0 81 85 c0 48 c7 c0 30 b5 a0 81 48 0f 44 d8 31 c0 f3 90 <44> 8b 25 82 35 96 00 41 39 c4 74 46 41 83 fc 02 74 08 41 83 fc
Uhhuh. NMI received for unknown reason 25 on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue

The above messages are repeated after this. I observed similar hangs
on other architectures as well (s390x, PowerPC, x86_32).

2.6.33-rc1 worked fine. I haven't tried the bisect. Will do that
first thing tomorrow morning. 

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CPU offline/online related hang with latest git
  2009-12-20 10:04 CPU offline/online related hang with latest git Sachin Sant
@ 2009-12-20 13:31 ` Peter Zijlstra
  2009-12-20 14:51   ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-12-20 13:31 UTC (permalink / raw)
  To: Sachin Sant
  Cc: linux-kernel, Ingo Molnar, Heiko Carstens, Benjamin Herrenschmidt

On Sun, 2009-12-20 at 15:34 +0530, Sachin Sant wrote:
> With 2.6.33-rc1-git1 (dd59f6c..) i am having trouble with
> CPU hotplug on multiple architectures. Trying to offline
> a CPU results in a machine hang.

damnit, you're right.. it's getting stuck during hotplug on stop machine
some place.

didn't notice it because it didn't crash.. 

sorry for that :-(


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CPU offline/online related hang with latest git
  2009-12-20 13:31 ` Peter Zijlstra
@ 2009-12-20 14:51   ` Jens Axboe
  2009-12-20 14:54     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2009-12-20 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

On Sun, Dec 20 2009, Peter Zijlstra wrote:
> On Sun, 2009-12-20 at 15:34 +0530, Sachin Sant wrote:
> > With 2.6.33-rc1-git1 (dd59f6c..) i am having trouble with
> > CPU hotplug on multiple architectures. Trying to offline
> > a CPU results in a machine hang.
> 
> damnit, you're right.. it's getting stuck during hotplug on stop machine
> some place.
> 
> didn't notice it because it didn't crash.. 
> 
> sorry for that :-(

Perhaps this is also why suspend doesn't work on current -git... -rc1 is
fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CPU offline/online related hang with latest git
  2009-12-20 14:51   ` Jens Axboe
@ 2009-12-20 14:54     ` Peter Zijlstra
  2009-12-20 14:57       ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-12-20 14:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

On Sun, 2009-12-20 at 15:51 +0100, Jens Axboe wrote:
> On Sun, Dec 20 2009, Peter Zijlstra wrote:
> > On Sun, 2009-12-20 at 15:34 +0530, Sachin Sant wrote:
> > > With 2.6.33-rc1-git1 (dd59f6c..) i am having trouble with
> > > CPU hotplug on multiple architectures. Trying to offline
> > > a CPU results in a machine hang.
> > 
> > damnit, you're right.. it's getting stuck during hotplug on stop machine
> > some place.
> > 
> > didn't notice it because it didn't crash.. 
> > 
> > sorry for that :-(
> 
> Perhaps this is also why suspend doesn't work on current -git... -rc1 is
> fine.

Yep, suspend relies on hotplug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CPU offline/online related hang with latest git
  2009-12-20 14:54     ` Peter Zijlstra
@ 2009-12-20 14:57       ` Jens Axboe
  2009-12-20 15:00         ` Peter Zijlstra
  2009-12-20 16:36         ` [PATCH] sched: Fix hotplug Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Jens Axboe @ 2009-12-20 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

On Sun, Dec 20 2009, Peter Zijlstra wrote:
> On Sun, 2009-12-20 at 15:51 +0100, Jens Axboe wrote:
> > On Sun, Dec 20 2009, Peter Zijlstra wrote:
> > > On Sun, 2009-12-20 at 15:34 +0530, Sachin Sant wrote:
> > > > With 2.6.33-rc1-git1 (dd59f6c..) i am having trouble with
> > > > CPU hotplug on multiple architectures. Trying to offline
> > > > a CPU results in a machine hang.
> > > 
> > > damnit, you're right.. it's getting stuck during hotplug on stop machine
> > > some place.
> > > 
> > > didn't notice it because it didn't crash.. 
> > > 
> > > sorry for that :-(
> > 
> > Perhaps this is also why suspend doesn't work on current -git... -rc1 is
> > fine.
> 
> Yep, suspend relies on hotplug.

Exactly, hence the connection. Shall I bisect, or do you already know
what the problem is?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CPU offline/online related hang with latest git
  2009-12-20 14:57       ` Jens Axboe
@ 2009-12-20 15:00         ` Peter Zijlstra
  2009-12-20 16:36         ` [PATCH] sched: Fix hotplug Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-12-20 15:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

On Sun, 2009-12-20 at 15:57 +0100, Jens Axboe wrote:

> Exactly, hence the connection. Shall I bisect, or do you already know
> what the problem is?

I have a definite suspect alright, let me prod at this a bit more.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] sched: Fix hotplug
  2009-12-20 14:57       ` Jens Axboe
  2009-12-20 15:00         ` Peter Zijlstra
@ 2009-12-20 16:36         ` Peter Zijlstra
  2009-12-20 22:14           ` Jens Axboe
  2009-12-20 22:33           ` [tip:sched/urgent] sched: Fix hotplug hang tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-12-20 16:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

The hot-unplug kstopmachine usage does a wakeup after deactivating the
cpu, hence we cannot use cpu_active() here but must rely on the good
olde online.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 720df10..0ac4fa5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2348,7 +2348,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_active(cpu)))
+		     !cpu_online(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: Fix hotplug
  2009-12-20 16:36         ` [PATCH] sched: Fix hotplug Peter Zijlstra
@ 2009-12-20 22:14           ` Jens Axboe
  2009-12-20 22:33           ` [tip:sched/urgent] sched: Fix hotplug hang tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2009-12-20 22:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sachin Sant, linux-kernel, Ingo Molnar, Heiko Carstens,
	Benjamin Herrenschmidt

On Sun, Dec 20 2009, Peter Zijlstra wrote:
> The hot-unplug kstopmachine usage does a wakeup after deactivating the
> cpu, hence we cannot use cpu_active() here but must rely on the good
> olde online.

Yep, this works for me!

Tested-by: Jens Axboe <jens.axboe@oracle.com>

> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 720df10..0ac4fa5 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2348,7 +2348,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
>  	 *   not worry about this generic constraint ]
>  	 */
>  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> -		     !cpu_active(cpu)))
> +		     !cpu_online(cpu)))
>  		cpu = select_fallback_rq(task_cpu(p), p);
>  
>  	return cpu;
> 
> 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:sched/urgent] sched: Fix hotplug hang
  2009-12-20 16:36         ` [PATCH] sched: Fix hotplug Peter Zijlstra
  2009-12-20 22:14           ` Jens Axboe
@ 2009-12-20 22:33           ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Peter Zijlstra @ 2009-12-20 22:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, sachinp,
	jens.axboe, benh, heiko.carstens, tglx, mingo

Commit-ID:  70f1120527797adb31c68bdc6f1b45e182c342c7
Gitweb:     http://git.kernel.org/tip/70f1120527797adb31c68bdc6f1b45e182c342c7
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Sun, 20 Dec 2009 17:36:27 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 20 Dec 2009 23:31:23 +0100

sched: Fix hotplug hang

The hot-unplug kstopmachine usage does a wakeup after
deactivating the cpu, hence we cannot use cpu_active()
here but must rely on the good olde online.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1261326987.4314.24.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 7ffde2a..87f1f47 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2346,7 +2346,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_active(cpu)))
+		     !cpu_online(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-12-20 22:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-20 10:04 CPU offline/online related hang with latest git Sachin Sant
2009-12-20 13:31 ` Peter Zijlstra
2009-12-20 14:51   ` Jens Axboe
2009-12-20 14:54     ` Peter Zijlstra
2009-12-20 14:57       ` Jens Axboe
2009-12-20 15:00         ` Peter Zijlstra
2009-12-20 16:36         ` [PATCH] sched: Fix hotplug Peter Zijlstra
2009-12-20 22:14           ` Jens Axboe
2009-12-20 22:33           ` [tip:sched/urgent] sched: Fix hotplug hang tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox