public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934
@ 2008-11-12 18:30 Maciej Rutecki
  2008-11-12 18:56 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej Rutecki @ 2008-11-12 18:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Rafael J. Wysocki, lenb@kernel.org,
	linux-acpi

-rc3 works OK

I have this bug during suspend to disk:

[  188.592151] Enabling non-boot CPUs ...
[  188.592151] SMP alternatives: switching to SMP code
[  188.666058] BUG: using smp_processor_id() in preemptible [00000000]
code: suspend_to_disk/2934
[  188.666064] caller is native_sched_clock+0x2b/0x80
[  188.666067] Pid: 2934, comm: suspend_to_disk Not tainted 2.6.28-rc4 #1
[  188.666069] Call Trace:
[  188.666074]  [<c02556ff>] debug_smp_processor_id+0xbf/0xd0
[  188.666077]  [<c010a2db>] native_sched_clock+0x2b/0x80
[  188.666080]  [<c03ae3a0>] init_idle+0x88/0x11a
[  188.666083]  [<c03ad206>] native_cpu_up+0x4a6/0x75a
[  188.666086]  [<c01de808>] sysfs_add_file_mode+0x58/0x90
[  188.666089]  [<c03ad4ba>] do_fork_idle+0x0/0x19
[  188.666092]  [<c014864e>] notifier_call_chain+0x3e/0x80
[  188.666095]  [<c03ae99a>] _cpu_up+0x6a/0x103
[  188.666097]  [<c03a1e75>] enable_nonboot_cpus+0x65/0xa0
[  188.666101]  [<c02cd240>] device_pm_unlock+0x0/0x10
[  188.666105]  [<c0159ba7>] hibernation_snapshot+0x137/0x200
[  188.666107]  [<c0158dff>] freeze_processes+0x4f/0xa0
[  188.666110]  [<c0159d4f>] hibernate+0xdf/0x190
[  188.666112]  [<c0158910>] state_store+0x0/0xd0
[  188.666115]  [<c01589c2>] state_store+0xb2/0xd0
[  188.666117]  [<c0158910>] state_store+0x0/0xd0
[  188.666120]  [<c024b1d4>] kobj_attr_store+0x24/0x30
[  188.666122]  [<c01de221>] sysfs_write_file+0xa1/0x110
[  188.666124]  [<c01de180>] sysfs_write_file+0x0/0x110
[  188.666127]  [<c0198e38>] vfs_write+0xa8/0x140
[  188.666129]  [<c0198fa1>] sys_write+0x41/0x80
[  188.666131]  [<c0103f49>] sysenter_do_call+0x12/0x21
[  188.666133]  [<c03b0000>] acpi_processor_start+0x690/0x69a
[  188.666136] Booting processor 1 APIC 0x1 ip 0x6000
[  188.590718] Initializing CPU#1

dmesg and config:
http://unixy.pl/maciek/download/kernel/2.6.28-rc4_smp/

-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934
  2008-11-12 18:30 [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934 Maciej Rutecki
@ 2008-11-12 18:56 ` Linus Torvalds
  2008-11-12 19:21   ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2008-11-12 18:56 UTC (permalink / raw)
  To: Maciej Rutecki, Ingo Molnar
  Cc: Linux Kernel Mailing List, Rafael J. Wysocki, lenb@kernel.org,
	linux-acpi



On Wed, 12 Nov 2008, Maciej Rutecki wrote:

> -rc3 works OK
> 
> I have this bug during suspend to disk:
> 
> [  188.592151] Enabling non-boot CPUs ...
> [  188.592151] SMP alternatives: switching to SMP code
> [  188.666058] BUG: using smp_processor_id() in preemptible [00000000]
> code: suspend_to_disk/2934
> [  188.666064] caller is native_sched_clock+0x2b/0x80

The cause seems to be commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5, 
"sched: optimize sched_clock() a bit" by Ingo.

Which actually comments on the fact that a few callers may need to be 
updated. That wasn't good. Ingo - it's not acceptable for a latish-rc 
patch to introduce _known_ bugs and not fixing everything up.

Of course, since Maciej has frame pointers disabled, the stack trace isn't 
entirely reliable, but it looks like the problem is "init_idle()".

That thing needs to call sched_clock() with interrupts disabled. Looking 
at it, I'd also expect that it should have used "sched_clock_cpu()", but 
I'm leaving that to Ingo to sort out. Ingo?

		Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934
  2008-11-12 18:56 ` Linus Torvalds
@ 2008-11-12 19:21   ` Ingo Molnar
  2008-11-12 20:34     ` Maciej Rutecki
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2008-11-12 19:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maciej Rutecki, Linux Kernel Mailing List, Rafael J. Wysocki,
	lenb@kernel.org, linux-acpi


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 12 Nov 2008, Maciej Rutecki wrote:
> 
> > -rc3 works OK
> > 
> > I have this bug during suspend to disk:
> > 
> > [  188.592151] Enabling non-boot CPUs ...
> > [  188.592151] SMP alternatives: switching to SMP code
> > [  188.666058] BUG: using smp_processor_id() in preemptible [00000000]
> > code: suspend_to_disk/2934
> > [  188.666064] caller is native_sched_clock+0x2b/0x80
> 
> The cause seems to be commit 
> 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5, "sched: optimize 
> sched_clock() a bit" by Ingo.
> 
> Which actually comments on the fact that a few callers may need to 
> be updated. That wasn't good. Ingo - it's not acceptable for a 
> latish-rc patch to introduce _known_ bugs and not fixing everything 
> up.
>
> Of course, since Maciej has frame pointers disabled, the stack trace 
> isn't entirely reliable, but it looks like the problem is 
> "init_idle()".
> 
> That thing needs to call sched_clock() with interrupts disabled. 
> Looking at it, I'd also expect that it should have used 
> "sched_clock_cpu()", but I'm leaving that to Ingo to sort out. Ingo?

i've queued up the fix below in tip/sched/urgent - Maciej, could you 
please test it? I've build and boot tested it.

I picked the simplest solution - to widen the (init only) use of the 
rq lock there, which is irqsafe already.

Another question is: how/where does the hibernation code call 
init_idle() with interrupts and preemption enabled? All the normal 
init_idle() sequences are called with irqs off (and at least with 
preemption off), that's why my testing didnt catch this bug.

[ I was under the distinct illusion that i had all callers covered 
  both via review and via testing (there's no that many users of naked 
  sched_clock() left). So there was certainly no bug known to me - my 
  comment was directed at out-of-tree naked-sched_clock() users which 
  do exist. The recommended in-kernel API is cpu_clock(cpu) - except
  for scheduler-internal use which was the issue here. ]

But it was still a late change and the boot warning was unnecessary, 
sorry about that.

	Ingo

------------->
>From 5cbd54ef470d880fc37fbe4b21eb514806d51e0d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 12 Nov 2008 20:05:50 +0100
Subject: [PATCH] sched: fix init_idle()'s use of sched_clock()

Maciej Rutecki reported:

> I have this bug during suspend to disk:
>
> [  188.592151] Enabling non-boot CPUs ...
> [  188.592151] SMP alternatives: switching to SMP code
> [  188.666058] BUG: using smp_processor_id() in preemptible
> [00000000]
> code: suspend_to_disk/2934
> [  188.666064] caller is native_sched_clock+0x2b/0x80

Which, as noted by Linus, was caused by me, via:

  7cbaef9c "sched: optimize sched_clock() a bit"

Move the rq locking a bit earlier in the initialization sequence,
that will make the sched_clock() call in init_idle() non-preemptible.

Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3bafbe3..c94baf2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5870,6 +5870,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
+	spin_lock_irqsave(&rq->lock, flags);
+
 	__sched_fork(idle);
 	idle->se.exec_start = sched_clock();
 
@@ -5877,7 +5879,6 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	idle->cpus_allowed = cpumask_of_cpu(cpu);
 	__set_task_cpu(idle, cpu);
 
-	spin_lock_irqsave(&rq->lock, flags);
 	rq->curr = rq->idle = idle;
 #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
 	idle->oncpu = 1;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934
  2008-11-12 19:21   ` Ingo Molnar
@ 2008-11-12 20:34     ` Maciej Rutecki
  2008-11-12 20:40       ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej Rutecki @ 2008-11-12 20:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List, Rafael J. Wysocki,
	lenb@kernel.org, linux-acpi

2008/11/12 Ingo Molnar <mingo@elte.hu>:

>
> i've queued up the fix below in tip/sched/urgent - Maciej, could you
> please test it? I've build and boot tested it.
>

Suspend to disk works OK now. Thanks.

Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
>
> Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3bafbe3..c94baf2 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5870,6 +5870,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
>        struct rq *rq = cpu_rq(cpu);
>        unsigned long flags;
>
> +       spin_lock_irqsave(&rq->lock, flags);
> +
>        __sched_fork(idle);
>        idle->se.exec_start = sched_clock();
>
> @@ -5877,7 +5879,6 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
>        idle->cpus_allowed = cpumask_of_cpu(cpu);
>        __set_task_cpu(idle, cpu);
>
> -       spin_lock_irqsave(&rq->lock, flags);
>        rq->curr = rq->idle = idle;
>  #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
>        idle->oncpu = 1;
>
>

Regards
-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934
  2008-11-12 20:34     ` Maciej Rutecki
@ 2008-11-12 20:40       ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2008-11-12 20:40 UTC (permalink / raw)
  To: Maciej Rutecki
  Cc: Linus Torvalds, Linux Kernel Mailing List, Rafael J. Wysocki,
	lenb@kernel.org, linux-acpi


* Maciej Rutecki <maciej.rutecki@gmail.com> wrote:

> 2008/11/12 Ingo Molnar <mingo@elte.hu>:
> 
> >
> > i've queued up the fix below in tip/sched/urgent - Maciej, could you
> > please test it? I've build and boot tested it.
> >
> 
> Suspend to disk works OK now. Thanks.
> 
> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>

thanks Maciej!

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-11-12 20:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 18:30 [Linux 2.6.28-rc4] BUG: using smp_processor_id() in preemptible [00000000] code: suspend_to_disk/2934 Maciej Rutecki
2008-11-12 18:56 ` Linus Torvalds
2008-11-12 19:21   ` Ingo Molnar
2008-11-12 20:34     ` Maciej Rutecki
2008-11-12 20:40       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox