linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Oops on boot (probably ACPI related)
@ 2006-09-27 12:24 Rolf Eike Beer
  2006-09-27 17:56 ` Markus Dahms
  2006-09-27 19:38 ` Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Rolf Eike Beer @ 2006-09-27 12:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: len.brown, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]

I get this on my machine. SMP kernel, linus git from this morning. .config and 
test available on request.

Eike

BUG: unable to handle kernel paging request at virtual address f0003504
 printing eip:
c102d804
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in:
CPU:    0
EIP:    0060:[<c102d804>]    Not tainted VLI
EFLAGS: 00010086   (2.6.18 #3)
EIP is at mark_lock+0x24/0x34c
eax: f00034ec   ebx: c126a674   ecx: 00000001   edx: 00000001
esi: c126a140   edi: 00000000   ebp: c1380e88   esp: c1380e78
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c1380000 task=c126a140 task.ti=c1380000)
Stack: 00000001 f00034ec 00000018 0000ffff c1380ec4 c102e5ee c11ee67a 00000000
       00000000 00000018 c126a140 c126a674 00000000 c126a674 00000000 00000001
       00000046 00000018 0000ffff c1380ee4 c102f03a 00000000 00000002 00000001
Call Trace:
 [<c102e5ee>] __lock_acquire+0x45e/0x967
 [<c102f03a>] lock_acquire+0x4b/0x6d
 [<c11f15ef>] _spin_lock_irqsave+0x22/0x32
 [<c11ee67a>] __down_trylock+0x12/0x48
 [<c11f0e76>] __down_failed_trylock+0xa/0x10
DWARF2 unwinder stuck at __down_failed_trylock+0xa/0x10

Leftover inexact backtrace:

 [<c10e9362>] acpi_os_wait_semaphore+0x38/0xd7
 [<c10ff1e6>] acpi_ut_acquire_mutex+0x39/0x77
 [<c10f768e>] acpi_ns_get_node+0x42/0x84
 [<c10f64e1>] acpi_ns_root_initialize+0x276/0x2ad
 [<c10fdd23>] acpi_initialize_subsystem+0x38/0x5d
 [<c1397f64>] acpi_early_init+0x4e/0x108
 [<c1384747>] start_kernel+0x376/0x383
 [<00000000>] 0x0
 =======================
Code: 8d 65 f8 5b 5e 5d c3 55 89 e5 57 56 53 83 ec 04 89 c6 89 d3 89 cf c7 45 
f0 01 00 00 00 d3 65 f0 8b 42 08 ba 01 00 00 00 8b 4d f0 <85> 48 18 0f 85 15 
03 00 00 f0 fe 0d 9c f8 26 c1 79 0d f3 90 80
EIP: [<c102d804>] mark_lock+0x24/0x34c SS:ESP 0068:c1380e78
 <0>Kernel panic - not syncing: Attempted to kill the idle task!

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer
@ 2006-09-27 17:56 ` Markus Dahms
  2006-09-27 18:40   ` Kyle McMartin
  2006-09-27 19:38 ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Markus Dahms @ 2006-09-27 17:56 UTC (permalink / raw)
  To: linux-acpi; +Cc: linux-kernel

Am Wed, 27 Sep 2006 14:24:47 +0200 schrieb Rolf Eike Beer:

> I get this on my machine. SMP kernel, linus git from this morning. .config
> and test available on request.

I encountered a similar bug, but a lot earlier. It seems to be a locking
problem, as it is lockdep which does the BUG() for me.
It's also an SMP machine in my case, acpi_os_wait_semaphore() is in the
call chain, too. No textual output (no serial connection attached, too
early for netconsole), but a screenshot:

http://automagically.de/images/linux-2.6.18+-acpi-lockup.jpg (154kB)

2.6.18 works for me, newer git versions explode.

Maybe it's an SMP-related problem, but it does BUG() before initialization
of the second CPU.

Markus



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 17:56 ` Markus Dahms
@ 2006-09-27 18:40   ` Kyle McMartin
  2006-09-27 19:38     ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Kyle McMartin @ 2006-09-27 18:40 UTC (permalink / raw)
  To: Markus Dahms; +Cc: linux-kernel, linux-acpi, akpm, torvalds

On Wed, Sep 27, 2006 at 07:56:13PM +0200, Markus Dahms wrote:
> > I get this on my machine. SMP kernel, linus git from this morning. .config
> > and test available on request.
> 

I saw this as well.

Reverting,
>       i386: Remove lock section support in semaphore.h

Fixes it for me (and apparently akpm too from Message-Id:
<20060926224114.5ca873ec.akpm@osdl.org>)

Linus, please revert 01215ad8d83e18321d99e9b5750a6f21cac243a2 for now...

Cheers,
	Kyle McMartin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer
  2006-09-27 17:56 ` Markus Dahms
@ 2006-09-27 19:38 ` Andi Kleen
  2006-09-28  7:04   ` Rolf Eike Beer
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2006-09-27 19:38 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: len.brown, linux-acpi, linux-kernel, akpm

Rolf Eike Beer <eike-kernel@sf-tec.de> writes:

> I get this on my machine. SMP kernel, linus git from this morning. .config and 
> test available on request.

What gcc do you use?

Anyways, does this patch fix it? This might have been Andrew's vaio problem too.

-Andi

i386: Use early clobbers for semaphores now

The new code does clobber the result early, so make sure to tell
gcc to not put it into the same register as a input argument

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-i386/semaphore.h
===================================================================
--- linux.orig/include/asm-i386/semaphore.h
+++ linux/include/asm-i386/semaphore.h
@@ -126,7 +126,7 @@ static inline int down_interruptible(str
 		"lea %1,%%eax\n\t"
 		"call __down_failed_interruptible\n"
 		"2:"
-		:"=a" (result), "+m" (sem->count)
+		:"=&a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;
@@ -148,7 +148,7 @@ static inline int down_trylock(struct se
 		"lea %1,%%eax\n\t"
 		"call __down_failed_trylock\n\t"
 		"2:\n"
-		:"=a" (result), "+m" (sem->count)
+		:"=&a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 18:40   ` Kyle McMartin
@ 2006-09-27 19:38     ` Andi Kleen
  2006-09-27 20:21       ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2006-09-27 19:38 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: linux-kernel, linux-acpi, akpm, torvalds

Kyle McMartin <kyle@parisc-linux.org> writes:

> On Wed, Sep 27, 2006 at 07:56:13PM +0200, Markus Dahms wrote:
> > > I get this on my machine. SMP kernel, linus git from this morning. .config
> > > and test available on request.
> > 
> 
> I saw this as well.
> 
> Reverting,
> >       i386: Remove lock section support in semaphore.h
> 
> Fixes it for me (and apparently akpm too from Message-Id:
> <20060926224114.5ca873ec.akpm@osdl.org>)
> 
> Linus, please revert 01215ad8d83e18321d99e9b5750a6f21cac243a2 for now...

I expect this patch to fix it.

-Andi


i386: Use early clobbers for semaphores now

The new code does clobber the result early, so make sure to tell
gcc to not put it into the same register as a input argument

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-i386/semaphore.h
===================================================================
--- linux.orig/include/asm-i386/semaphore.h
+++ linux/include/asm-i386/semaphore.h
@@ -126,7 +126,7 @@ static inline int down_interruptible(str
 		"lea %1,%%eax\n\t"
 		"call __down_failed_interruptible\n"
 		"2:"
-		:"=a" (result), "+m" (sem->count)
+		:"=&a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;
@@ -148,7 +148,7 @@ static inline int down_trylock(struct se
 		"lea %1,%%eax\n\t"
 		"call __down_failed_trylock\n\t"
 		"2:\n"
-		:"=a" (result), "+m" (sem->count)
+		:"=&a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 19:38     ` Andi Kleen
@ 2006-09-27 20:21       ` Linus Torvalds
  2006-09-27 20:35         ` Linus Torvalds
  2006-09-27 20:58         ` Kyle McMartin
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2006-09-27 20:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm



On Wed, 27 Sep 2006, Andi Kleen wrote:
> 
> I expect this patch to fix it.

Andrew, Kyle, can you verify?

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 20:21       ` Linus Torvalds
@ 2006-09-27 20:35         ` Linus Torvalds
  2006-09-27 20:50           ` Andi Kleen
  2006-09-27 20:58         ` Kyle McMartin
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2006-09-27 20:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm



On Wed, 27 Sep 2006, Linus Torvalds wrote:
> 
> On Wed, 27 Sep 2006, Andi Kleen wrote:
> > 
> > I expect this patch to fix it.
> 
> Andrew, Kyle, can you verify?

Not that it really matters. Andi sure as hell pinpointed a real problem 
with the new and broken inline asm. That's almost certainly the bug that 
crept in during the recent rewrite.

HOWEVER, now that I look more closely at the rewrite, I'm really wondering 
whether the rewrite was worth it at all. It generates smaller code, but at 
the expense of

 - the actual cache-footprint is bigger
 - the branch will now be mis-predicted by default

Since the "smaller code" really only tends to matter from a cache 
usage standpoint, I don't know if I'm at all convinced.

The fact that rewinders have problems is fairly immaterial. Maybe we 
should just take this as a hint that all the stupid rewinding code was 
wrong in the first place, and we should stop doing that? We can go back 
to just printing out our stacktrace guesses, that has worked for us for a 
long time, and the stack unwinding simply looks _fundamentally_ flawed.

So I have a real urge to just revert that change anyway.

Are there any _real_ advantages to this broken unwinding code that has had 
more bugs that Windows XP?

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 20:35         ` Linus Torvalds
@ 2006-09-27 20:50           ` Andi Kleen
  2006-09-27 21:38             ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2006-09-27 20:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich

On Wednesday 27 September 2006 22:35, Linus Torvalds wrote:
> 
> On Wed, 27 Sep 2006, Linus Torvalds wrote:
> > 
> > On Wed, 27 Sep 2006, Andi Kleen wrote:
> > > 
> > > I expect this patch to fix it.
> > 
> > Andrew, Kyle, can you verify?
> 
> Not that it really matters. Andi sure as hell pinpointed a real problem 
> with the new and broken inline asm. That's almost certainly the bug that 
> crept in during the recent rewrite.
> 
> HOWEVER, now that I look more closely at the rewrite, I'm really wondering 
> whether the rewrite was worth it at all. It generates smaller code, but at 
> the expense of
> 
>  - the actual cache-footprint is bigger
>  - the branch will now be mis-predicted by default

It doesn't matter much because these days this stuff is all out of lined
anyways and in a single function. And the dynamic branch predictor
in all modern CPUs will usually cache the decision (unlocked) there.

(Actually there is something dumb  left -- on a non preempt kernel
spin_unlock caller is larger than doing it inline. But that is left
for fixing later)
 
> The fact that rewinders have problems is fairly immaterial. Maybe we 
> should just take this as a hint that all the stupid rewinding code was 
> wrong in the first place, and we should stop doing that? We can go back 
> to just printing out our stacktrace guesse

> 
> 		Linus
> 
s, that has worked for us for a 
> long time, and the stack unwinding simply looks _fundamentally_ flawed.

Unfortunately Linux is a lot more complex than it was in the early days.
 
> So I have a real urge to just revert that change anyway.
> 
> Are there any _real_ advantages to this broken unwinding code that has had 
> more bugs that Windows XP?

I thought for a long time we didn't need it either, but these days with all 
these callbacks in some parts of the kernel (driver model, others) and you 
get a oops with 60+ entries it is just too much trouble to figure it out manually.

I admit when I took the code I didn't realize that dwarf2 has these
problems (not supporting out of line sections is clearly a spec
bug and would even hit gcc generated code). But we don't have 
that many out of line sections anyways, so it's not that big an issue. 

And all the people who process a lot of oopses (e.g. Andrew, Ingo, others) tend
to use frame pointers by default anyways. They already voted with their feet.
And the unwinder certainly gives better code than frame pointers. The mispredicted
branches you're worrying about are nothing against frame pointers 
(e.g. on K8 FP tends to stall the CPU on each function call slightly)

Anyways, in theory it would be possible to keep the out of line sections
and define some own dwarf2 extension that allows us to express them.
Jan might have some thoughts on it. But I didn't think it was worth
it for these cases due to the reasons above.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 20:21       ` Linus Torvalds
  2006-09-27 20:35         ` Linus Torvalds
@ 2006-09-27 20:58         ` Kyle McMartin
  2006-09-27 22:32           ` Andrew Morton
  1 sibling, 1 reply; 13+ messages in thread
From: Kyle McMartin @ 2006-09-27 20:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Kyle McMartin, linux-kernel, linux-acpi, akpm

On Wed, Sep 27, 2006 at 01:21:17PM -0700, Linus Torvalds wrote:
> On Wed, 27 Sep 2006, Andi Kleen wrote:
> > I expect this patch to fix it.
> 
> Andrew, Kyle, can you verify?
> 

Yup, it works. (For reference, it's gcc 4.1.1-13 from Debian.)

Cheers,
	Kyle M.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 20:50           ` Andi Kleen
@ 2006-09-27 21:38             ` Linus Torvalds
  2006-09-28  7:49               ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2006-09-27 21:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich



On Wed, 27 Sep 2006, Andi Kleen wrote:
> 
> It doesn't matter much because these days this stuff is all out of lined
> anyways and in a single function. And the dynamic branch predictor
> in all modern CPUs will usually cache the decision (unlocked) there.

Ahh, good point. Once there's only one copy, the branch predictor will get 
it right (and the code size won't much matter)

> > Are there any _real_ advantages to this broken unwinding code that has had 
> > more bugs that Windows XP?
> 
> I thought for a long time we didn't need it either, but these days with all 
> these callbacks in some parts of the kernel (driver model, others) and you 
> get a oops with 60+ entries it is just too much trouble to figure it out manually.

Ok, fair enough. I'll apply your fix (which in itself is obviously 
correct).

I just wanted to bring up the possibility that we should just remove the 
(fragile) unwinder.

But let's leave it for another day, if it keeps being problematic.

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 20:58         ` Kyle McMartin
@ 2006-09-27 22:32           ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2006-09-27 22:32 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: Linus Torvalds, Andi Kleen, linux-kernel, linux-acpi

On Wed, 27 Sep 2006 16:58:05 -0400
Kyle McMartin <kyle@parisc-linux.org> wrote:

> On Wed, Sep 27, 2006 at 01:21:17PM -0700, Linus Torvalds wrote:
> > On Wed, 27 Sep 2006, Andi Kleen wrote:
> > > I expect this patch to fix it.
> > 
> > Andrew, Kyle, can you verify?
> > 
> 
> Yup, it works.

Ditto.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 19:38 ` Andi Kleen
@ 2006-09-28  7:04   ` Rolf Eike Beer
  0 siblings, 0 replies; 13+ messages in thread
From: Rolf Eike Beer @ 2006-09-28  7:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: len.brown, linux-acpi, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

Am Mittwoch, 27. September 2006 21:38 schrieb Andi Kleen:
> Rolf Eike Beer <eike-kernel@sf-tec.de> writes:
> > I get this on my machine. SMP kernel, linus git from this morning.
> > .config and test available on request.
>
> What gcc do you use?

4.1.0 (SuSE 10.1)

> Anyways, does this patch fix it? This might have been Andrew's vaio problem
> too.

Looks good, now it hangs because the init skript seems to have problems 
activating the root volume group. But that's a different story.

Eike

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] Oops on boot (probably ACPI related)
  2006-09-27 21:38             ` Linus Torvalds
@ 2006-09-28  7:49               ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2006-09-28  7:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich

On Wednesday 27 September 2006 23:38, Linus Torvalds wrote:
> 
> On Wed, 27 Sep 2006, Andi Kleen wrote:
> > 
> > It doesn't matter much because these days this stuff is all out of lined
> > anyways and in a single function. And the dynamic branch predictor
> > in all modern CPUs will usually cache the decision (unlocked) there.
> 
> Ahh, good point. Once there's only one copy, the branch predictor will get 
> it right (and the code size won't much matter)

As a postscript I (unintentionally) bended the truth on that one actually
yesterday. Sorry for that. Semaphores are still inline, unlike spinlocks.

However if the spinlocks are out of line I see no reason to keep semaphores
inline either, so perhaps it would be better to just move them. Then my
argument above would actually work :)

For some reason the unwinder also still seems to get stuck on it :/

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-09-28  7:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer
2006-09-27 17:56 ` Markus Dahms
2006-09-27 18:40   ` Kyle McMartin
2006-09-27 19:38     ` Andi Kleen
2006-09-27 20:21       ` Linus Torvalds
2006-09-27 20:35         ` Linus Torvalds
2006-09-27 20:50           ` Andi Kleen
2006-09-27 21:38             ` Linus Torvalds
2006-09-28  7:49               ` Andi Kleen
2006-09-27 20:58         ` Kyle McMartin
2006-09-27 22:32           ` Andrew Morton
2006-09-27 19:38 ` Andi Kleen
2006-09-28  7:04   ` Rolf Eike Beer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).