All of lore.kernel.org
 help / color / mirror / Atom feed
* enabling lockdep causes boot failure on a SUN T2000
@ 2008-05-08 22:39 Bjoern B. Brandenburg
  2008-05-08 23:07 ` David Miller
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Bjoern B. Brandenburg @ 2008-05-08 22:39 UTC (permalink / raw)
  To: sparclinux

Hi all,

I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25) 
to boot on our Niagara box.

With many configurations, the system gets "stuck" during boot when 
executing init from the initramfs (Ubuntu). I've been test-booting 
various configurations now for three days, and I've finally been able to 
narrow it down to what appears to be lockdep.

With lockdep disabled, the system boots fine and runs stable. With 
lockdep enabled on 2.6.24, the system starts executing the init scripts 
from the initramfs and then just sits there. With 2.6.25, the system 
also dies but I had more luck and could capture some debug output.

The case that works:
	config: http://www.cs.unc.edu/~bbb/linux/config-boots-fine
	boot:   http://www.cs.unc.edu/~bbb/linux/boot.ok.txt

The case that doesn't work:
	config: http://www.cs.unc.edu/~bbb/linux/config-stuck-initramfs
	boot:   http://www.cs.unc.edu/~bbb/linux/boot.fail.txt

The only difference is that I enabled lockdep (see diff below). This is 
problematic for us since we would like to use lockdep to debug a patch 
that we are working on.

Please let me know how we can help to debug this.

Thanks,
Bjoern


--- config-boots-fine   2008-05-08 17:39:44.000000000 -0400
+++ config-stuck-initramfs      2008-05-08 17:18:25.000000000 -0400
@@ -1,7 +1,7 @@
  #
  # Automatically generated make config: don't edit
  # Linux kernel version: 2.6.25
-# Thu May  8 17:21:04 2008
+# Thu May  8 16:58:26 2008
  #
  CONFIG_SPARC=y
  CONFIG_SPARC64=y
@@ -977,19 +979,24 @@
  CONFIG_DEBUG_PREEMPT=y
  # CONFIG_DEBUG_RT_MUTEXES is not set
  # CONFIG_RT_MUTEX_TESTER is not set
-# CONFIG_DEBUG_SPINLOCK is not set
-# CONFIG_DEBUG_MUTEXES is not set
-# CONFIG_DEBUG_LOCK_ALLOC is not set
-# CONFIG_PROVE_LOCKING is not set
+CONFIG_DEBUG_SPINLOCK=y
+CONFIG_DEBUG_MUTEXES=y
+CONFIG_DEBUG_LOCK_ALLOC=y
+CONFIG_PROVE_LOCKING=y
+CONFIG_LOCKDEP=y
  # CONFIG_LOCK_STAT is not set
+# CONFIG_DEBUG_LOCKDEP is not set
+CONFIG_TRACE_IRQFLAGS=y
  CONFIG_DEBUG_SPINLOCK_SLEEP=y
  CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
+CONFIG_STACKTRACE=y
  # CONFIG_DEBUG_KOBJECT is not set
  CONFIG_DEBUG_BUGVERBOSE=y
  CONFIG_DEBUG_INFO=y
  # CONFIG_DEBUG_VM is not set
  CONFIG_DEBUG_LIST=y
  # CONFIG_DEBUG_SG is not set
+CONFIG_FRAME_POINTER=y
  # CONFIG_BOOT_PRINTK_DELAY is not set
  # CONFIG_RCU_TORTURE_TEST is not set
  # CONFIG_BACKTRACE_SELF_TEST is not set

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: enabling lockdep causes boot failure on a SUN T2000
  2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
@ 2008-05-08 23:07 ` David Miller
  2008-05-09  1:50 ` Bjoern B. Brandenburg
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-08 23:07 UTC (permalink / raw)
  To: sparclinux

From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
Date: Thu, 08 May 2008 18:39:33 -0400

> I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25) 
> to boot on our Niagara box.

We know about this problem already.  Thanks for reporting.

This changeset below was meant to improve things, but t1000/t2000
systems still get wedged.

I'll investigate when I get a chance.

commit 85a793533524f333e8d630dc22450e574b7e08d2
Author: David S. Miller <davem@davemloft.net>
Date:   Mon Mar 24 20:06:24 2008 -0700

    [SPARC64]: Make save_stack_trace() more efficient.
    
    Doing a 'flushw' every stack trace capture creates so much overhead
    that it makes lockdep next to unusable.
    
    We only care about the frame pointer chain and the function caller
    program counters, so flush those by hand to the stack frame.
    
    This is significantly more efficient than a 'flushw' because:
    
    1) We only save 16 bytes per active register window to the stack.
    
    2) This doesn't push the entire register window context of the current
       call chain out of the cpu, forcing register window fill traps as we
       return back down.
    
    Note that we can't use 'restore' and 'save' instructions to move
    around the register windows because that wouldn't work on Niagara
    processors.  They optimize 'save' into a new register window by
    simply clearing out the registers instead of pulling them in from
    the on-chip register window backing store.
    
    Based upon a report by Tom Callaway.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: enabling lockdep causes boot failure on a SUN T2000
  2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
  2008-05-08 23:07 ` David Miller
@ 2008-05-09  1:50 ` Bjoern B. Brandenburg
  2008-05-09  4:21 ` David Miller
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Bjoern B. Brandenburg @ 2008-05-09  1:50 UTC (permalink / raw)
  To: sparclinux

David Miller wrote:
> From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
>> I've been having a hard time getting Linux (vanilla 2.6.24 and 2.6.25) 
>> to boot on our Niagara box.
> 
> We know about this problem already.  Thanks for reporting.
> 
> This changeset below was meant to improve things, but t1000/t2000
> systems still get wedged.

Thanks for the pointer.

Based on your commit, I tried to disable stack trace support (with the 
intent to get at least some output in the event of deadlock) with the 
following patch:

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index e2c07ec..b468790 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -342,7 +342,9 @@ static int save_trace(struct stack_trace *trace)

         trace->skip = 3;

-       save_stack_trace(trace);
+       /* Work around: this doesn't work on the T2000.
+        * save_stack_trace(trace);
+        */

         trace->max_entries = trace->nr_entries;


I figured that if the flushes are the root cause, then this should take 
care of the problem. However, the kernel still doesn't boot.

Is there anything else that I can disable to get a partially working 
lockdep?

Thanks,
Bjoern

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: enabling lockdep causes boot failure on a SUN T2000
  2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
  2008-05-08 23:07 ` David Miller
  2008-05-09  1:50 ` Bjoern B. Brandenburg
@ 2008-05-09  4:21 ` David Miller
  2008-05-09  4:55 ` Bjoern Brandenburg
  2008-05-09  5:02 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-09  4:21 UTC (permalink / raw)
  To: sparclinux

From: "Bjoern B. Brandenburg" <bbb@email.unc.edu>
Date: Thu, 08 May 2008 21:50:06 -0400

> I figured that if the flushes are the root cause, then this should take 
> care of the problem. However, the kernel still doesn't boot.

That patch you posted is dangerous.  If the stack isn't flushed,
there will be garabage in the frame pointer slots of the stack
frame, and likely that will cause an OOPS.

Please, let me try to resolve this as I find time to do so.  I am the
only person who knows enough about all of these internals so solve the
problem at this time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: enabling lockdep causes boot failure on a SUN T2000
  2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
                   ` (2 preceding siblings ...)
  2008-05-09  4:21 ` David Miller
@ 2008-05-09  4:55 ` Bjoern Brandenburg
  2008-05-09  5:02 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Bjoern Brandenburg @ 2008-05-09  4:55 UTC (permalink / raw)
  To: sparclinux

On May 9, 2008, at 12:21 AM, David Miller wrote:
>> I figured that if the flushes are the root cause, then this should  
>> take
>> care of the problem. However, the kernel still doesn't boot.
>
> That patch you posted is dangerous.  If the stack isn't flushed,
> there will be garabage in the frame pointer slots of the stack
> frame, and likely that will cause an OOPS.
>

That was not indented as a proper patch, that was just a quick hack  
attempt to help me solve my problems.

But I'm curious. How is it dangerous not to call save_stack-trace()?  
No frame pointers will be accessed since no stack trace is recorded.  
If nothing is recorded, how can they be accessed later? trace- 
 >nr_entries just remains 0, and print_stack_trace() properly handles  
stack traces of length 0. I assume the rest of lockdep is robust  
enough to handle such a corner case, too.

(I'm looking at the 2.6.24 source, since that is what I'm working with  
atm.)

> Please, let me try to resolve this as I find time to do so.

Sure, I just need to get my stuff to work. I'd appreciate any hints on  
how to get lockdep semi-working/limping, if there is an easy way to do  
so.

Anyway, thanks for your feedback.

- Bjoern


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: enabling lockdep causes boot failure on a SUN T2000
  2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
                   ` (3 preceding siblings ...)
  2008-05-09  4:55 ` Bjoern Brandenburg
@ 2008-05-09  5:02 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-05-09  5:02 UTC (permalink / raw)
  To: sparclinux

From: Bjoern Brandenburg <bbb@email.unc.edu>
Date: Fri, 9 May 2008 00:55:03 -0400

> On May 9, 2008, at 12:21 AM, David Miller wrote:
> Sure, I just need to get my stuff to work. I'd appreciate any hints on  
> how to get lockdep semi-working/limping, if there is an easy way to do  
> so.

That's not possible until I figure out what the problem is and fix the
bug.  And keeping this email thread going is not conducive to me being
able to get the time necessary to do so. :-/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-05-09  5:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-08 22:39 enabling lockdep causes boot failure on a SUN T2000 Bjoern B. Brandenburg
2008-05-08 23:07 ` David Miller
2008-05-09  1:50 ` Bjoern B. Brandenburg
2008-05-09  4:21 ` David Miller
2008-05-09  4:55 ` Bjoern Brandenburg
2008-05-09  5:02 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.