* enabling lockdep causes boot failure on a SUN T2000
@ 2008-05-08 22:39 Bjoern B. Brandenburg
2008-05-08 23:07 ` David Miller
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Bjoern B. Brandenburg @ 2008-05-08 22:39 UTC (permalink / raw)
To: sparclinux
Hi all,
I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25)
to boot on our Niagara box.
With many configurations, the system gets "stuck" during boot when
executing init from the initramfs (Ubuntu). I've been test-booting
various configurations now for three days, and I've finally been able to
narrow it down to what appears to be lockdep.
With lockdep disabled, the system boots fine and runs stable. With
lockdep enabled on 2.6.24, the system starts executing the init scripts
from the initramfs and then just sits there. With 2.6.25, the system
also dies but I had more luck and could capture some debug output.
The case that works:
config: http://www.cs.unc.edu/~bbb/linux/config-boots-fine
boot: http://www.cs.unc.edu/~bbb/linux/boot.ok.txt
The case that doesn't work:
config: http://www.cs.unc.edu/~bbb/linux/config-stuck-initramfs
boot: http://www.cs.unc.edu/~bbb/linux/boot.fail.txt
The only difference is that I enabled lockdep (see diff below). This is
problematic for us since we would like to use lockdep to debug a patch
that we are working on.
Please let me know how we can help to debug this.
Thanks,
Bjoern
--- config-boots-fine 2008-05-08 17:39:44.000000000 -0400
+++ config-stuck-initramfs 2008-05-08 17:18:25.000000000 -0400
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.25
-# Thu May 8 17:21:04 2008
+# Thu May 8 16:58:26 2008
#
CONFIG_SPARC=y
CONFIG_SPARC64=y
@@ -977,19 +979,24 @@
CONFIG_DEBUG_PREEMPT=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
-# CONFIG_DEBUG_SPINLOCK is not set
-# CONFIG_DEBUG_MUTEXES is not set
-# CONFIG_DEBUG_LOCK_ALLOC is not set
-# CONFIG_PROVE_LOCKING is not set
+CONFIG_DEBUG_SPINLOCK=y
+CONFIG_DEBUG_MUTEXES=y
+CONFIG_DEBUG_LOCK_ALLOC=y
+CONFIG_PROVE_LOCKING=y
+CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
+# CONFIG_DEBUG_LOCKDEP is not set
+CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
+CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
+CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enabling lockdep causes boot failure on a SUN T2000
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
@ 2008-05-08 23:07 ` David Miller
2008-05-09 1:50 ` Bjoern B. Brandenburg
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-08 23:07 UTC (permalink / raw)
To: sparclinux
From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
Date: Thu, 08 May 2008 18:39:33 -0400
> I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25)
> to boot on our Niagara box.
We know about this problem already. Thanks for reporting.
This changeset below was meant to improve things, but t1000/t2000
systems still get wedged.
I'll investigate when I get a chance.
commit 85a793533524f333e8d630dc22450e574b7e08d2
Author: David S. Miller <davem@davemloft.net>
Date: Mon Mar 24 20:06:24 2008 -0700
[SPARC64]: Make save_stack_trace() more efficient.
Doing a 'flushw' every stack trace capture creates so much overhead
that it makes lockdep next to unusable.
We only care about the frame pointer chain and the function caller
program counters, so flush those by hand to the stack frame.
This is significantly more efficient than a 'flushw' because:
1) We only save 16 bytes per active register window to the stack.
2) This doesn't push the entire register window context of the current
call chain out of the cpu, forcing register window fill traps as we
return back down.
Note that we can't use 'restore' and 'save' instructions to move
around the register windows because that wouldn't work on Niagara
processors. They optimize 'save' into a new register window by
simply clearing out the registers instead of pulling them in from
the on-chip register window backing store.
Based upon a report by Tom Callaway.
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enabling lockdep causes boot failure on a SUN T2000
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
2008-05-08 23:07 ` David Miller
@ 2008-05-09 1:50 ` Bjoern B. Brandenburg
2008-05-09 4:21 ` David Miller
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Bjoern B. Brandenburg @ 2008-05-09 1:50 UTC (permalink / raw)
To: sparclinux
David Miller wrote:
> From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
>> I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25)
>> to boot on our Niagara box.
>
> We know about this problem already. Thanks for reporting.
>
> This changeset below was meant to improve things, but t1000/t2000
> systems still get wedged.
Thanks for the pointer.
Based on your commit, I tried to disable stack trace support (with the
intent to get at least some output in the event of deadlock) with the
following patch:
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index e2c07ec..b468790 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -342,7 +342,9 @@ static int save_trace(struct stack_trace *trace)
trace->skip = 3;
- save_stack_trace(trace);
+ /* Work around: this doesn't work on the T2000.
+ * save_stack_trace(trace);
+ */
trace->max_entries = trace->nr_entries;
I figured that if the flushes are the root cause, then this should take
care of the problem. However, the kernel still doesn't boot.
Is there anything else that I can disable to get a partially working
lockdep?
Thanks,
Bjoern
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: enabling lockdep causes boot failure on a SUN T2000
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
2008-05-08 23:07 ` David Miller
2008-05-09 1:50 ` Bjoern B. Brandenburg
@ 2008-05-09 4:21 ` David Miller
2008-05-09 4:55 ` Bjoern Brandenburg
2008-05-09 5:02 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-09 4:21 UTC (permalink / raw)
To: sparclinux
From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
Date: Thu, 08 May 2008 21:50:06 -0400
> I figured that if the flushes are the root cause, then this should take
> care of the problem. However, the kernel still doesn't boot.
That patch you posted is dangerous. If the stack isn't flushed,
there will be garabage in the frame pointer slots of the stack
frame, and likely that will cause an OOPS.
Please, let me try to resolve this as I find time to do so. I am the
only person who knows enough about all of these internals so solve the
problem at this time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enabling lockdep causes boot failure on a SUN T2000
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
` (2 preceding siblings ...)
2008-05-09 4:21 ` David Miller
@ 2008-05-09 4:55 ` Bjoern Brandenburg
2008-05-09 5:02 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: Bjoern Brandenburg @ 2008-05-09 4:55 UTC (permalink / raw)
To: sparclinux
On May 9, 2008, at 12:21 AM, David Miller wrote:
>> I figured that if the flushes are the root cause, then this should
>> take
>> care of the problem. However, the kernel still doesn't boot.
>
> That patch you posted is dangerous. If the stack isn't flushed,
> there will be garabage in the frame pointer slots of the stack
> frame, and likely that will cause an OOPS.
>
That was not indented as a proper patch, that was just a quick hack
attempt to help me solve my problems.
But I'm curious. How is it dangerous not to call save_stack-trace()?
No frame pointers will be accessed since no stack trace is recorded.
If nothing is recorded, how can they be accessed later? trace-
>nr_entries just remains 0, and print_stack_trace() properly handles
stack traces of length 0. I assume the rest of lockdep is robust
enough to handle such a corner case, too.
(I'm looking at the 2.6.24 source, since that is what I'm working with
atm.)
> Please, let me try to resolve this as I find time to do so.
Sure, I just need to get my stuff to work. I'd appreciate any hints on
how to get lockdep semi-working/limping, if there is an easy way to do
so.
Anyway, thanks for your feedback.
- Bjoern
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: enabling lockdep causes boot failure on a SUN T2000
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
` (3 preceding siblings ...)
2008-05-09 4:55 ` Bjoern Brandenburg
@ 2008-05-09 5:02 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-09 5:02 UTC (permalink / raw)
To: sparclinux
From: Bjoern Brandenburg <bbb@email.unc.edu>
Date: Fri, 9 May 2008 00:55:03 -0400
> On May 9, 2008, at 12:21 AM, David Miller wrote:
> Sure, I just need to get my stuff to work. I'd appreciate any hints on
> how to get lockdep semi-working/limping, if there is an easy way to do
> so.
That's not possible until I figure out what the problem is and fix the
bug. And keeping this email thread going is not conducive to me being
able to get the time necessary to do so. :-/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-05-09 5:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
2008-05-08 23:07 ` David Miller
2008-05-09 1:50 ` Bjoern B. Brandenburg
2008-05-09 4:21 ` David Miller
2008-05-09 4:55 ` Bjoern Brandenburg
2008-05-09 5:02 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.