All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.14-rc3-rt10 crashes on boot
@ 2005-10-07  0:37 John Rigg
  2005-10-07  6:36 ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: John Rigg @ 2005-10-07  0:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar


Now that rt kernels are compiling again on x86_64 I'm getting
a recurrence of a boot problem I had with 2.6.13-rt4 on a dual
Opteron. If I enable latency tracing it crashes during boot, but 
if I use exactly the same .config except with latency tracing 
disabled it boots and runs fine. Every version of rt I've managed 
to compile since then has crashed in the same place with latency
tracing enabled. 
Below are excerpts from .config and from boot messages via serial 
console.
                      ________________________

CONFIG_SMP=y
CONFIG_PREEMPT_RT=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_SOFTIRQS=y
CONFIG_PREEMPT_HARDIRQS=y
CONFIG_PREEMPT_BKL=y
CONFIG_PREEMPT_RCU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
# CONFIG_PRINTK_IGNORE_LOGLEVEL is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
# CONFIG_DEBUG_SLAB is not set
CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_IRQ_FLAGS=y
CONFIG_WAKEUP_TIMING=y
CONFIG_WAKEUP_LATENCY_HIST=y
CONFIG_PREEMPT_TRACE=y
# CONFIG_CRITICAL_PREEMPT_TIMING is not set
# CONFIG_CRITICAL_IRQSOFF_TIMING is not set
CONFIG_LATENCY_TIMING=y
CONFIG_LATENCY_HIST=y
CONFIG_LATENCY_TRACE=y
CONFIG_MCOUNT=y
CONFIG_RT_DEADLOCK_DETECT=y
# CONFIG_DEBUG_RT_LOCKING_MODE is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_FS is not set
CONFIG_FRAME_POINTER=y
# CONFIG_INIT_DEBUG is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_KPROBES is not set
                     ______________________________

<snip>
Bootdata ok (command line is root=/dev/hda7 ro console=tty0 console=ttyS0,38400 )
Linux version 2.6.14-rc3-rt10-mindriv-debug-amd64-k8-smp (jqr@mj0lnir) (gcc version 4.0.2 20050917 (prerelease) (Debian 4.0.1-8)) #1 SMP PREEMPT Thu Oct 6 22:43:49 UTC 2005
</snip>
...

<snip>
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdd: _NEC DVD_RW ND-3520A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 80293248 sectors (41110 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(133)
hda: cache flushes supported
 hda: hda1 hda2 < hda5 hda6 hda7 >
umount: devfs: not mounted
mount: unknown filesystem type 'devfs'
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
umount: devfs: not mounted
INIT: version 2.86 booting
hotplug[877]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffff8bee68 error 15
hotplug[878]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffb1a408 error 15
hotplug[879]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffff878408 error 15
hotplug[880]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffad36d8 error 15
init[1]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffc00b10 error 15
init[1]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffc003b8 error 15
rcS[882]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffff967428 error 15
init[1]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffc003b8 error 15
init[1]: segfault at ffffffff8010f588 rip ffffffff8010f588 rsp 00007fffffc003b8 error 15
</snip>

After this point the same error message is repeated in an infinite loop and a 
hard reboot is required. 
If I get time in the next few days I'll try and find the earliest
version of rt that does this (haven't tried anything earlier than 2.6.13-rt4).

John

^ permalink raw reply	[flat|nested] 11+ messages in thread
* 2.6.14-rc3-rt10 crashes on boot
@ 2005-10-07 15:11 John Rigg
  2005-10-07 15:48 ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: John Rigg @ 2005-10-07 15:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

On Friday 7 October 2005 Ingo Molnar wrote:
>i got overflows in initramfs's gunzip with certain debug options. I have 
>improved the stack footprint of the worst offenders in -rt11 (see the 
>standalone patch below) - John, does it boot any better?

Ah. I'm using initrd. With CONFIG_LATENCY_TRACE=y my initrd.img is
large, > 3.6MB. Maybe it's time to try initramfs.

BTW I'm having trouble enabling DEBUG_STACKOVERFLOW. I can see
it in arch/i386/Kconfig.debug (and not in arch/x86_64/Kconfig.debug), 
but it doesn't appear in menuconfig no matter what other kernel hacking 
options I enable. If I add it manually to .config it just gets removed 
by `make oldconfig'. Is this an x86_64 issue?

For now I'll assume that there is a stack overflow and try initramfs.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: 2.6.14-rc3-rt10 crashes on boot
@ 2005-10-07 19:16 John Rigg
  0 siblings, 0 replies; 11+ messages in thread
From: John Rigg @ 2005-10-07 19:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Steven Rostedt

On Friday, October 7 Steve Rostedt wrote:

>Add this patch and it will add the option for you in x86_64 (I forgot that
>you were using that).  I even set it to be default on. I didn't add a test
>in do_IRQ, but I believe that the tests in latency.c should be good
>enough.

Hi Steve,

Thanks for the patch. I applied it to 2.6.14-rc3-rt12, looked in
arch/x86_64/Kconfig.debug just to be sure it applied OK to -rt12,
then ran make. It failed to compile, with the following message:

  CC      kernel/rt.o
  CC      kernel/latency.o
kernel/latency.c: In function '__print_worst_stack':
kernel/latency.c:336: warning: format '%d' expects type 'int', but argument 5 has type 'long unsigned int'
kernel/latency.c:384:3: error: #error Poke the author of above asm code line !
kernel/latency.c: In function 'debug_stackoverflow':
kernel/latency.c:386: error: 'STACK_WARN' undeclared (first use in this function)
kernel/latency.c:386: error: (Each undeclared identifier is reported only once
kernel/latency.c:386: error: for each function it appears in.)
make[1]: *** [kernel/latency.o] Error 1
make: *** [kernel] Error 2

I wonder if DEBUG_STACKOVERFLOW was left out of x86_64 for this reason.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread
[parent not found: <E1ENxei-0001C9-F7@localhost.localdomain>]
* Re: 2.6.14-rc3-rt10 crashes on boot
@ 2005-10-09 16:28 John Rigg
  0 siblings, 0 replies; 11+ messages in thread
From: John Rigg @ 2005-10-09 16:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Steven Rostedt

On Friday, October 7 Steven Rostedt wrote:

>Here's an addon patch to my last one.  I don't know x86_64 very well, but
>I believe the the asm is pretty much the same, so this patch removes the
>check for __i386__ and also defines STACK_WARN.

>Index: linux-rt-quilt/include/asm-x86_64/page.h
>===================================================================
>--- linux-rt-quilt.orig/include/asm-x86_64/page.h	2005-10-06 08:04:00.000000000 -0400
>+++ linux-rt-quilt/include/asm-x86_64/page.h	2005-10-07 15:34:20.000000000 -0400
>@@ -21,6 +21,8 @@
> #endif
> #define CURRENT_MASK (~(THREAD_SIZE-1))
>
>+#define STACK_WARN             (THREAD_SIZE/8)
>+
> #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
> #define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
>
>Index: linux-rt-quilt/kernel/latency.c
>===================================================================
>--- linux-rt-quilt.orig/kernel/latency.c	2005-10-06 08:04:56.000000000 -0400
>+++ linux-rt-quilt/kernel/latency.c	2005-10-07 15:31:20.000000000 -0400
>@@ -377,7 +377,8 @@
> 	atomic_inc(&tr->disabled);
>
> 	/* Debugging check for stack overflow: is there less than 1KB free? */
>-#ifdef __i386__
>+#if 1 // def __i386__
>+	/* Hopefully this works on x86_64!  */
> 	__asm__ __volatile__("andl %%esp,%0" :
> 				"=r" (stack_left) : "0" (THREAD_SIZE - 1));
> #else

Steve, thanks for these patches. I got it to compile with 2.6.14-rc3-rt12
but had to change the assembly lines in (patched) latency.c to

__asm__ __volatile__("and %%rsp,%0" :
 				"=r" (stack_left) : "0" (THREAD_SIZE - 1));

ie. `and' instead of `andl' and `%%rsp' instead of `%%esp'.
Somebody who understands x86_64 assembly better than I do should probably check 
this before anyone tries using it.
While I was at it I changed a printk arg in line 335 of (patched) latency.c - 
I think the last %d should be %ld, ie. 

printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
	worst_stack_comm, worst_stack_pid, MAX_STACK-worst_stack_left, MAX_STACK); 

John

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: 2.6.14-rc3-rt10 crashes on boot
@ 2005-10-09 16:30 John Rigg
  0 siblings, 0 replies; 11+ messages in thread
From: John Rigg @ 2005-10-09 16:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

On Sunday October 9 2005 John Rigg wrote:
>Steve, thanks for these patches. I got it to compile with 2.6.14-rc3-rt12
>but had to change the assembly lines in (patched) latency.c to
>
>        __asm__ __volatile__("and %%rsp,%0" :
> 				"=r" (stack_left) : "0" (THREAD_SIZE - 1));
>
>ie. `and' instead of `andl' and `%%rsp' instead of `%%esp'.
>Somebody who understands x86_64 assembly better than I do should probably check 
>this before anyone tries using it.
>While I was at it I changed a printk arg in line 335 of (patched)
>latency.c - I think the last %d should be %ld, ie. 
>
>printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
>	worst_stack_comm, worst_stack_pid, MAX_STACK-worst_stack_left, MAX_STACK);

Ingo, thanks to help from Steve Rostedt I got 2.6.14-rc3-rt12 to compile
with CONFIG_DEBUG_STACKOVERFLOW=y on x86_64 smp. Unfortunately if I enable 
it along with latency tracing (which is causing the crash during boot) 
it crashes so early that I can't get anything from the serial console,
even using earlyprintk. All I get is a blank screen for a few seconds
then the machine reboots.
I have all other debugging options disabled (apart from necessary dependencies 
for these two). This of course means that I can't confirm whether the crash
is caused by stack overflow.
With latency tracing disabled but CONFIG_DEBUG_STACKOVERFLOW=y the kernel 
boots and runs fine.
BTW this is still with initrd. If the stack footprint is likely to
be smaller with initramfs I'll give it a try, but it'll be a few days
before I can set this up (I still have to work out how to use initramfs).

John

^ permalink raw reply	[flat|nested] 11+ messages in thread
[parent not found: <E1EOe3U-00018A-Fy@localhost.localdomain>]

end of thread, other threads:[~2005-10-11 11:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-07  0:37 2.6.14-rc3-rt10 crashes on boot John Rigg
2005-10-07  6:36 ` Steven Rostedt
2005-10-07 11:43   ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2005-10-07 15:11 John Rigg
2005-10-07 15:48 ` Steven Rostedt
2005-10-07 19:16 John Rigg
     [not found] <E1ENxei-0001C9-F7@localhost.localdomain>
2005-10-07 19:42 ` Steven Rostedt
2005-10-11 10:55   ` Ingo Molnar
2005-10-09 16:28 John Rigg
2005-10-09 16:30 John Rigg
     [not found] <E1EOe3U-00018A-Fy@localhost.localdomain>
2005-10-11 11:17 ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.