* [BUG] Oops on boot (probably ACPI related)
@ 2006-09-27 12:24 Rolf Eike Beer
2006-09-27 17:56 ` Markus Dahms
2006-09-27 19:38 ` Andi Kleen
0 siblings, 2 replies; 13+ messages in thread
From: Rolf Eike Beer @ 2006-09-27 12:24 UTC (permalink / raw)
To: linux-kernel; +Cc: len.brown, linux-acpi
[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]
I get this on my machine. SMP kernel, linus git from this morning. .config and
test available on request.
Eike
BUG: unable to handle kernel paging request at virtual address f0003504
printing eip:
c102d804
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c102d804>] Not tainted VLI
EFLAGS: 00010086 (2.6.18 #3)
EIP is at mark_lock+0x24/0x34c
eax: f00034ec ebx: c126a674 ecx: 00000001 edx: 00000001
esi: c126a140 edi: 00000000 ebp: c1380e88 esp: c1380e78
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c1380000 task=c126a140 task.ti=c1380000)
Stack: 00000001 f00034ec 00000018 0000ffff c1380ec4 c102e5ee c11ee67a 00000000
00000000 00000018 c126a140 c126a674 00000000 c126a674 00000000 00000001
00000046 00000018 0000ffff c1380ee4 c102f03a 00000000 00000002 00000001
Call Trace:
[<c102e5ee>] __lock_acquire+0x45e/0x967
[<c102f03a>] lock_acquire+0x4b/0x6d
[<c11f15ef>] _spin_lock_irqsave+0x22/0x32
[<c11ee67a>] __down_trylock+0x12/0x48
[<c11f0e76>] __down_failed_trylock+0xa/0x10
DWARF2 unwinder stuck at __down_failed_trylock+0xa/0x10
Leftover inexact backtrace:
[<c10e9362>] acpi_os_wait_semaphore+0x38/0xd7
[<c10ff1e6>] acpi_ut_acquire_mutex+0x39/0x77
[<c10f768e>] acpi_ns_get_node+0x42/0x84
[<c10f64e1>] acpi_ns_root_initialize+0x276/0x2ad
[<c10fdd23>] acpi_initialize_subsystem+0x38/0x5d
[<c1397f64>] acpi_early_init+0x4e/0x108
[<c1384747>] start_kernel+0x376/0x383
[<00000000>] 0x0
=======================
Code: 8d 65 f8 5b 5e 5d c3 55 89 e5 57 56 53 83 ec 04 89 c6 89 d3 89 cf c7 45
f0 01 00 00 00 d3 65 f0 8b 42 08 ba 01 00 00 00 8b 4d f0 <85> 48 18 0f 85 15
03 00 00 f0 fe 0d 9c f8 26 c1 79 0d f3 90 80
EIP: [<c102d804>] mark_lock+0x24/0x34c SS:ESP 0068:c1380e78
<0>Kernel panic - not syncing: Attempted to kill the idle task!
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer @ 2006-09-27 17:56 ` Markus Dahms 2006-09-27 18:40 ` Kyle McMartin 2006-09-27 19:38 ` Andi Kleen 1 sibling, 1 reply; 13+ messages in thread From: Markus Dahms @ 2006-09-27 17:56 UTC (permalink / raw) To: linux-kernel; +Cc: linux-acpi Am Wed, 27 Sep 2006 14:24:47 +0200 schrieb Rolf Eike Beer: > I get this on my machine. SMP kernel, linus git from this morning. .config > and test available on request. I encountered a similar bug, but a lot earlier. It seems to be a locking problem, as it is lockdep which does the BUG() for me. It's also an SMP machine in my case, acpi_os_wait_semaphore() is in the call chain, too. No textual output (no serial connection attached, too early for netconsole), but a screenshot: http://automagically.de/images/linux-2.6.18+-acpi-lockup.jpg (154kB) 2.6.18 works for me, newer git versions explode. Maybe it's an SMP-related problem, but it does BUG() before initialization of the second CPU. Markus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 17:56 ` Markus Dahms @ 2006-09-27 18:40 ` Kyle McMartin 2006-09-27 19:38 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Kyle McMartin @ 2006-09-27 18:40 UTC (permalink / raw) To: Markus Dahms; +Cc: linux-kernel, linux-acpi, akpm, torvalds On Wed, Sep 27, 2006 at 07:56:13PM +0200, Markus Dahms wrote: > > I get this on my machine. SMP kernel, linus git from this morning. .config > > and test available on request. > I saw this as well. Reverting, > i386: Remove lock section support in semaphore.h Fixes it for me (and apparently akpm too from Message-Id: <20060926224114.5ca873ec.akpm@osdl.org>) Linus, please revert 01215ad8d83e18321d99e9b5750a6f21cac243a2 for now... Cheers, Kyle McMartin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 18:40 ` Kyle McMartin @ 2006-09-27 19:38 ` Andi Kleen 2006-09-27 20:21 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2006-09-27 19:38 UTC (permalink / raw) To: Kyle McMartin; +Cc: linux-kernel, linux-acpi, akpm, torvalds Kyle McMartin <kyle@parisc-linux.org> writes: > On Wed, Sep 27, 2006 at 07:56:13PM +0200, Markus Dahms wrote: > > > I get this on my machine. SMP kernel, linus git from this morning. .config > > > and test available on request. > > > > I saw this as well. > > Reverting, > > i386: Remove lock section support in semaphore.h > > Fixes it for me (and apparently akpm too from Message-Id: > <20060926224114.5ca873ec.akpm@osdl.org>) > > Linus, please revert 01215ad8d83e18321d99e9b5750a6f21cac243a2 for now... I expect this patch to fix it. -Andi i386: Use early clobbers for semaphores now The new code does clobber the result early, so make sure to tell gcc to not put it into the same register as a input argument Signed-off-by: Andi Kleen <ak@suse.de> Index: linux/include/asm-i386/semaphore.h =================================================================== --- linux.orig/include/asm-i386/semaphore.h +++ linux/include/asm-i386/semaphore.h @@ -126,7 +126,7 @@ static inline int down_interruptible(str "lea %1,%%eax\n\t" "call __down_failed_interruptible\n" "2:" - :"=a" (result), "+m" (sem->count) + :"=&a" (result), "+m" (sem->count) : :"memory"); return result; @@ -148,7 +148,7 @@ static inline int down_trylock(struct se "lea %1,%%eax\n\t" "call __down_failed_trylock\n\t" "2:\n" - :"=a" (result), "+m" (sem->count) + :"=&a" (result), "+m" (sem->count) : :"memory"); return result; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 19:38 ` Andi Kleen @ 2006-09-27 20:21 ` Linus Torvalds 2006-09-27 20:35 ` Linus Torvalds 2006-09-27 20:58 ` Kyle McMartin 0 siblings, 2 replies; 13+ messages in thread From: Linus Torvalds @ 2006-09-27 20:21 UTC (permalink / raw) To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm On Wed, 27 Sep 2006, Andi Kleen wrote: > > I expect this patch to fix it. Andrew, Kyle, can you verify? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 20:21 ` Linus Torvalds @ 2006-09-27 20:35 ` Linus Torvalds 2006-09-27 20:50 ` Andi Kleen 2006-09-27 20:58 ` Kyle McMartin 1 sibling, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2006-09-27 20:35 UTC (permalink / raw) To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm On Wed, 27 Sep 2006, Linus Torvalds wrote: > > On Wed, 27 Sep 2006, Andi Kleen wrote: > > > > I expect this patch to fix it. > > Andrew, Kyle, can you verify? Not that it really matters. Andi sure as hell pinpointed a real problem with the new and broken inline asm. That's almost certainly the bug that crept in during the recent rewrite. HOWEVER, now that I look more closely at the rewrite, I'm really wondering whether the rewrite was worth it at all. It generates smaller code, but at the expense of - the actual cache-footprint is bigger - the branch will now be mis-predicted by default Since the "smaller code" really only tends to matter from a cache usage standpoint, I don't know if I'm at all convinced. The fact that rewinders have problems is fairly immaterial. Maybe we should just take this as a hint that all the stupid rewinding code was wrong in the first place, and we should stop doing that? We can go back to just printing out our stacktrace guesses, that has worked for us for a long time, and the stack unwinding simply looks _fundamentally_ flawed. So I have a real urge to just revert that change anyway. Are there any _real_ advantages to this broken unwinding code that has had more bugs that Windows XP? Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 20:35 ` Linus Torvalds @ 2006-09-27 20:50 ` Andi Kleen 2006-09-27 21:38 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2006-09-27 20:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich On Wednesday 27 September 2006 22:35, Linus Torvalds wrote: > > On Wed, 27 Sep 2006, Linus Torvalds wrote: > > > > On Wed, 27 Sep 2006, Andi Kleen wrote: > > > > > > I expect this patch to fix it. > > > > Andrew, Kyle, can you verify? > > Not that it really matters. Andi sure as hell pinpointed a real problem > with the new and broken inline asm. That's almost certainly the bug that > crept in during the recent rewrite. > > HOWEVER, now that I look more closely at the rewrite, I'm really wondering > whether the rewrite was worth it at all. It generates smaller code, but at > the expense of > > - the actual cache-footprint is bigger > - the branch will now be mis-predicted by default It doesn't matter much because these days this stuff is all out of lined anyways and in a single function. And the dynamic branch predictor in all modern CPUs will usually cache the decision (unlocked) there. (Actually there is something dumb left -- on a non preempt kernel spin_unlock caller is larger than doing it inline. But that is left for fixing later) > The fact that rewinders have problems is fairly immaterial. Maybe we > should just take this as a hint that all the stupid rewinding code was > wrong in the first place, and we should stop doing that? We can go back > to just printing out our stacktrace guesse > > Linus > s, that has worked for us for a > long time, and the stack unwinding simply looks _fundamentally_ flawed. Unfortunately Linux is a lot more complex than it was in the early days. > So I have a real urge to just revert that change anyway. > > Are there any _real_ advantages to this broken unwinding code that has had > more bugs that Windows XP? I thought for a long time we didn't need it either, but these days with all these callbacks in some parts of the kernel (driver model, others) and you get a oops with 60+ entries it is just too much trouble to figure it out manually. I admit when I took the code I didn't realize that dwarf2 has these problems (not supporting out of line sections is clearly a spec bug and would even hit gcc generated code). But we don't have that many out of line sections anyways, so it's not that big an issue. And all the people who process a lot of oopses (e.g. Andrew, Ingo, others) tend to use frame pointers by default anyways. They already voted with their feet. And the unwinder certainly gives better code than frame pointers. The mispredicted branches you're worrying about are nothing against frame pointers (e.g. on K8 FP tends to stall the CPU on each function call slightly) Anyways, in theory it would be possible to keep the out of line sections and define some own dwarf2 extension that allows us to express them. Jan might have some thoughts on it. But I didn't think it was worth it for these cases due to the reasons above. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 20:50 ` Andi Kleen @ 2006-09-27 21:38 ` Linus Torvalds 2006-09-28 7:49 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2006-09-27 21:38 UTC (permalink / raw) To: Andi Kleen; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich On Wed, 27 Sep 2006, Andi Kleen wrote: > > It doesn't matter much because these days this stuff is all out of lined > anyways and in a single function. And the dynamic branch predictor > in all modern CPUs will usually cache the decision (unlocked) there. Ahh, good point. Once there's only one copy, the branch predictor will get it right (and the code size won't much matter) > > Are there any _real_ advantages to this broken unwinding code that has had > > more bugs that Windows XP? > > I thought for a long time we didn't need it either, but these days with all > these callbacks in some parts of the kernel (driver model, others) and you > get a oops with 60+ entries it is just too much trouble to figure it out manually. Ok, fair enough. I'll apply your fix (which in itself is obviously correct). I just wanted to bring up the possibility that we should just remove the (fragile) unwinder. But let's leave it for another day, if it keeps being problematic. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 21:38 ` Linus Torvalds @ 2006-09-28 7:49 ` Andi Kleen 0 siblings, 0 replies; 13+ messages in thread From: Andi Kleen @ 2006-09-28 7:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kyle McMartin, linux-kernel, linux-acpi, akpm, jbeulich On Wednesday 27 September 2006 23:38, Linus Torvalds wrote: > > On Wed, 27 Sep 2006, Andi Kleen wrote: > > > > It doesn't matter much because these days this stuff is all out of lined > > anyways and in a single function. And the dynamic branch predictor > > in all modern CPUs will usually cache the decision (unlocked) there. > > Ahh, good point. Once there's only one copy, the branch predictor will get > it right (and the code size won't much matter) As a postscript I (unintentionally) bended the truth on that one actually yesterday. Sorry for that. Semaphores are still inline, unlike spinlocks. However if the spinlocks are out of line I see no reason to keep semaphores inline either, so perhaps it would be better to just move them. Then my argument above would actually work :) For some reason the unwinder also still seems to get stuck on it :/ -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 20:21 ` Linus Torvalds 2006-09-27 20:35 ` Linus Torvalds @ 2006-09-27 20:58 ` Kyle McMartin 2006-09-27 22:32 ` Andrew Morton 1 sibling, 1 reply; 13+ messages in thread From: Kyle McMartin @ 2006-09-27 20:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andi Kleen, Kyle McMartin, linux-kernel, linux-acpi, akpm On Wed, Sep 27, 2006 at 01:21:17PM -0700, Linus Torvalds wrote: > On Wed, 27 Sep 2006, Andi Kleen wrote: > > I expect this patch to fix it. > > Andrew, Kyle, can you verify? > Yup, it works. (For reference, it's gcc 4.1.1-13 from Debian.) Cheers, Kyle M. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 20:58 ` Kyle McMartin @ 2006-09-27 22:32 ` Andrew Morton 0 siblings, 0 replies; 13+ messages in thread From: Andrew Morton @ 2006-09-27 22:32 UTC (permalink / raw) To: Kyle McMartin; +Cc: Linus Torvalds, Andi Kleen, linux-kernel, linux-acpi On Wed, 27 Sep 2006 16:58:05 -0400 Kyle McMartin <kyle@parisc-linux.org> wrote: > On Wed, Sep 27, 2006 at 01:21:17PM -0700, Linus Torvalds wrote: > > On Wed, 27 Sep 2006, Andi Kleen wrote: > > > I expect this patch to fix it. > > > > Andrew, Kyle, can you verify? > > > > Yup, it works. Ditto. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer 2006-09-27 17:56 ` Markus Dahms @ 2006-09-27 19:38 ` Andi Kleen 2006-09-28 7:04 ` Rolf Eike Beer 1 sibling, 1 reply; 13+ messages in thread From: Andi Kleen @ 2006-09-27 19:38 UTC (permalink / raw) To: Rolf Eike Beer; +Cc: len.brown, linux-acpi, linux-kernel, akpm Rolf Eike Beer <eike-kernel@sf-tec.de> writes: > I get this on my machine. SMP kernel, linus git from this morning. .config and > test available on request. What gcc do you use? Anyways, does this patch fix it? This might have been Andrew's vaio problem too. -Andi i386: Use early clobbers for semaphores now The new code does clobber the result early, so make sure to tell gcc to not put it into the same register as a input argument Signed-off-by: Andi Kleen <ak@suse.de> Index: linux/include/asm-i386/semaphore.h =================================================================== --- linux.orig/include/asm-i386/semaphore.h +++ linux/include/asm-i386/semaphore.h @@ -126,7 +126,7 @@ static inline int down_interruptible(str "lea %1,%%eax\n\t" "call __down_failed_interruptible\n" "2:" - :"=a" (result), "+m" (sem->count) + :"=&a" (result), "+m" (sem->count) : :"memory"); return result; @@ -148,7 +148,7 @@ static inline int down_trylock(struct se "lea %1,%%eax\n\t" "call __down_failed_trylock\n\t" "2:\n" - :"=a" (result), "+m" (sem->count) + :"=&a" (result), "+m" (sem->count) : :"memory"); return result; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUG] Oops on boot (probably ACPI related) 2006-09-27 19:38 ` Andi Kleen @ 2006-09-28 7:04 ` Rolf Eike Beer 0 siblings, 0 replies; 13+ messages in thread From: Rolf Eike Beer @ 2006-09-28 7:04 UTC (permalink / raw) To: Andi Kleen; +Cc: len.brown, linux-acpi, linux-kernel, akpm [-- Attachment #1: Type: text/plain, Size: 511 bytes --] Am Mittwoch, 27. September 2006 21:38 schrieb Andi Kleen: > Rolf Eike Beer <eike-kernel@sf-tec.de> writes: > > I get this on my machine. SMP kernel, linus git from this morning. > > .config and test available on request. > > What gcc do you use? 4.1.0 (SuSE 10.1) > Anyways, does this patch fix it? This might have been Andrew's vaio problem > too. Looks good, now it hangs because the init skript seems to have problems activating the root volume group. But that's a different story. Eike [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-09-28 7:49 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-09-27 12:24 [BUG] Oops on boot (probably ACPI related) Rolf Eike Beer 2006-09-27 17:56 ` Markus Dahms 2006-09-27 18:40 ` Kyle McMartin 2006-09-27 19:38 ` Andi Kleen 2006-09-27 20:21 ` Linus Torvalds 2006-09-27 20:35 ` Linus Torvalds 2006-09-27 20:50 ` Andi Kleen 2006-09-27 21:38 ` Linus Torvalds 2006-09-28 7:49 ` Andi Kleen 2006-09-27 20:58 ` Kyle McMartin 2006-09-27 22:32 ` Andrew Morton 2006-09-27 19:38 ` Andi Kleen 2006-09-28 7:04 ` Rolf Eike Beer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox