Linux MIPS Architecture development
 help / color / mirror / Atom feed
* crash in first printk of start_kernel
@ 2007-03-09 18:13 Attila Kinali
  2007-03-13  1:02 ` Ralf Baechle
  0 siblings, 1 reply; 3+ messages in thread
From: Attila Kinali @ 2007-03-09 18:13 UTC (permalink / raw)
  To: linux-mips

Hi,

I'm currently to bring up a new embedded system we build
using the Alchemy Au1550 CPU. The board itself looks like
a downsized version of the DB1550 evaluation board from AMD,
with 64MB DRAM and a 32MB NOR flash, plus PC-Card controller,
one ethernet port used.

I'm using a 2.6.16.11 (an old snapshot from about last august,
when we started development of another board) that has slight
adjustments in various drivers to accomodate for our platform
specific stuff.

While booting from u-boot works fine, ie it can load the kernel
from tftp, checksum is ok, uncompression is ok. It then jumps
into the kernel but ends up in an exception handler when executing
the first printk in start_kernel (ie the printk(KERN_NOTICE);
at init/main.c:451).

While in the "exception handler" endless loop from u-boot 
ra points to vscnprintf() (lib/vsprintf.c:512)
where it calls vsnprintf() (at line 517).

If i set a breakpoint at lib/vsprintf.c:517 and follow it
into vsnprintf, i get the follwing gdb log:

---
Breakpoint 3, vscnprintf (buf=0x80411548 "", size=0x803b1fac,
    fmt=0x8036f10c "<5>", args=0x803b1fac) at lib/vsprintf.c:517
517             i=vsnprintf(buf,size,fmt,args);
(gdb) step
vsnprintf (buf=0x80411548 "", size=0x400, fmt=0x8036f10c "<5>", args=0xff0000)
    at lib/vsprintf.c:276
276             if (unlikely((int) size < 0)) {
(gdb) next
285             end = buf + size - 1;
(gdb) next
287             if (end < buf - 1) {
(gdb) p end
$1 = 0x80411947 ""
(gdb) p buf
$2 = 0x80411548 ""
---

if i go here further (either next/step or continue), then i end up
in the exception handler. So, it must be something in the asm
of this line:

---
(gdb) disassemble $pc $pc+100
Dump of assembler code from 0x8026612c to 0x80266190:
0xffffffff8026612c <vsnprintf+68>:      addiu   v0,a0,-1
0xffffffff80266130 <vsnprintf+72>:      sltu    v0,s2,v0
0xffffffff80266134 <vsnprintf+76>:      beqz    v0,0x80266144 <vsnprintf+92>
0xffffffff80266138 <vsnprintf+80>:      bltz    a0,0x802461c0 <jffs2_remount_fs+144>
0xffffffff8026613c <vsnprintf+84>:      li      s2,-1
0xffffffff80266140 <vsnprintf+88>:      negu    s6,a0
0xffffffff80266144 <vsnprintf+92>:      lb      v0,0(a2)
0xffffffff80266148 <vsnprintf+96>:      beqz    v0,0x80266190 <vsnprintf+168>
0xffffffff8026614c <vsnprintf+100>:     move    a0,a2
0xffffffff80266150 <vsnprintf+104>:     lb      v1,0(a0)
0xffffffff80266154 <vsnprintf+108>:     li      v0,37
0xffffffff80266158 <vsnprintf+112>:     beq     v1,v0,0x802661dc <vsnprintf+244>
0xffffffff8026615c <vsnprintf+116>:     lbu     a0,4(a0)
0xffffffff80266160 <vsnprintf+120>:     sltu    v0,s2,s0
0xffffffff80266164 <vsnprintf+124>:     bnez    v0,0x80266174 <vsnprintf+140>
0xffffffff80266168 <vsnprintf+128>:     sllv    zero,zero,zero
0xffffffff8026616c <vsnprintf+132>:     sb      a0,0(s0)
0xffffffff80266170 <vsnprintf+136>:     lw      a2,80(sp)
0xffffffff80266174 <vsnprintf+140>:     addiu   s0,s0,1
0xffffffff80266178 <vsnprintf+144>:     addiu   v0,a2,1
0xffffffff8026617c <vsnprintf+148>:     sw      v0,80(sp)
0xffffffff80266180 <vsnprintf+152>:     lb      v1,0(v0)
0xffffffff80266184 <vsnprintf+156>:     move    a0,v0
0xffffffff80266188 <vsnprintf+160>:     bnez    v1,0x80266150 <vsnprintf+104>
0xffffffff8026618c <vsnprintf+164>:     move    a2,v0
End of assembler dump.
---

But if i use stepi from here on, then it looks like that the
BDI2000's breakpoint (both soft and hard) somehow interferes
with the execution of the code:

---
(gdb) display/i $pc
1: x/i $pc  0x8026612c <vsnprintf+68>:  addiu   v0,a0,-1
(gdb) stepi
0xffffffff80266130      287             if (end < buf - 1) {
1: x/i $pc  0x80266130 <vsnprintf+72>:  sltu    v0,s2,v0
(gdb) stepi
287             if (end < buf - 1) {
1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287             if (end < buf - 1) {
1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287             if (end < buf - 1) {
1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287             if (end < buf - 1) {
1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
(gdb) stepi
---

Yes, the next instruction (bltz    a0,0x802461c0 <jffs2_remount_fs+144>)
looks fishy and should be most probably a NOP. If i patch this point
by hand and continue it crashes at another point.

The only two reasons why the instruction at this point could be
wrong, is either a gcc bug or a bug in u-boot's gunzip function.
And i somewhat doubt both. gcc is a 3.3.1 from Montavista and both
gcc and u-boot work on our other board which has a similar layout.

Has anyone an idea what the problem here could be or how i could
narrow it down?

Thanks in advance

			Attila Kinali

-- 
Praised are the Fountains of Shelieth, the silver harp of the waters,
But blest in my name forever this stream that stanched my thirst!
                         -- Deed of Morred

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: crash in first printk of start_kernel
  2007-03-09 18:13 crash in first printk of start_kernel Attila Kinali
@ 2007-03-13  1:02 ` Ralf Baechle
  2007-03-14  8:04   ` Attila Kinali
  0 siblings, 1 reply; 3+ messages in thread
From: Ralf Baechle @ 2007-03-13  1:02 UTC (permalink / raw)
  To: Attila Kinali; +Cc: linux-mips

On Fri, Mar 09, 2007 at 07:13:54PM +0100, Attila Kinali wrote:

Grüzi Attila,

> I'm using a 2.6.16.11 (an old snapshot from about last august,
> when we started development of another board) that has slight
> adjustments in various drivers to accomodate for our platform
> specific stuff.

You may want to update anyway.  between the linux-2.6.16.11 tag of the
linux-mips.org and the top of the linux-2.6.16-stable branch there are
almost 58,000 lines of patch.  Even more if your compare the MIPS -stable
branch to kernel.org's 2.6.16.11.  Iow a few metric buttloads.

> While booting from u-boot works fine, ie it can load the kernel
> from tftp, checksum is ok, uncompression is ok. It then jumps
> into the kernel but ends up in an exception handler when executing
> the first printk in start_kernel (ie the printk(KERN_NOTICE);
> at init/main.c:451).
> 
> While in the "exception handler" endless loop from u-boot 
> ra points to vscnprintf() (lib/vsprintf.c:512)
> where it calls vsnprintf() (at line 517).
> 
> If i set a breakpoint at lib/vsprintf.c:517 and follow it
> into vsnprintf, i get the follwing gdb log:
> 
> ---
> Breakpoint 3, vscnprintf (buf=0x80411548 "", size=0x803b1fac,
>     fmt=0x8036f10c "<5>", args=0x803b1fac) at lib/vsprintf.c:517
> 517             i=vsnprintf(buf,size,fmt,args);
> (gdb) step
> vsnprintf (buf=0x80411548 "", size=0x400, fmt=0x8036f10c "<5>", args=0xff0000)
>     at lib/vsprintf.c:276
> 276             if (unlikely((int) size < 0)) {
> (gdb) next
> 285             end = buf + size - 1;
> (gdb) next
> 287             if (end < buf - 1) {
> (gdb) p end
> $1 = 0x80411947 ""
> (gdb) p buf
> $2 = 0x80411548 ""
> ---
> 
> if i go here further (either next/step or continue), then i end up
> in the exception handler. So, it must be something in the asm
> of this line:
> 
> ---
> (gdb) disassemble $pc $pc+100
> Dump of assembler code from 0x8026612c to 0x80266190:

> 0xffffffff80266134 <vsnprintf+76>:      beqz    v0,0x80266144 <vsnprintf+92>
> 0xffffffff80266138 <vsnprintf+80>:      bltz    a0,0x802461c0 <jffs2_remount_fs+144>

A branch in the delay slot of another branch is forbidden by the MIPS
architecture.  All processors I every tried this on missbehave in very
unobvious ways when this is attempted.

You may want to compare that against your vmlinux file.  If the vmlinux
binary also contains this bug, try building the affected source file with
-S to find if the bug is cause by compiler or assembler.

> End of assembler dump.
> ---
> 
> But if i use stepi from here on, then it looks like that the
> BDI2000's breakpoint (both soft and hard) somehow interferes
> with the execution of the code:

In single stepping mode your debugger probably executes branches by
software emulation.  Chances are the emulation does something different
for this illegal code sequence than actual hardware.

> ---
> (gdb) display/i $pc
> 1: x/i $pc  0x8026612c <vsnprintf+68>:  addiu   v0,a0,-1
> (gdb) stepi
> 0xffffffff80266130      287             if (end < buf - 1) {
> 1: x/i $pc  0x80266130 <vsnprintf+72>:  sltu    v0,s2,v0
> (gdb) stepi
> 287             if (end < buf - 1) {
> 1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287             if (end < buf - 1) {
> 1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287             if (end < buf - 1) {
> 1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287             if (end < buf - 1) {
> 1: x/i $pc  0x80266134 <vsnprintf+76>:  beqz    v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> ---
> 
> Yes, the next instruction (bltz    a0,0x802461c0 <jffs2_remount_fs+144>)
> looks fishy and should be most probably a NOP. If i patch this point
> by hand and continue it crashes at another point.
> 
> The only two reasons why the instruction at this point could be
> wrong, is either a gcc bug or a bug in u-boot's gunzip function.
> And i somewhat doubt both. gcc is a 3.3.1 from Montavista and both
> gcc and u-boot work on our other board which has a similar layout.

3.3.1 is _dangerously_ old.  At the very least update to 3.3.6 - or even
4.1.2.  A vanilla source gcc tree from ftp.gnu.org is fine to build the
kernel.

  Ralf

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: crash in first printk of start_kernel
  2007-03-13  1:02 ` Ralf Baechle
@ 2007-03-14  8:04   ` Attila Kinali
  0 siblings, 0 replies; 3+ messages in thread
From: Attila Kinali @ 2007-03-14  8:04 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

Hoi Ralf,

On Tue, 13 Mar 2007 01:02:52 +0000
Ralf Baechle <ralf@linux-mips.org> wrote:

> On Fri, Mar 09, 2007 at 07:13:54PM +0100, Attila Kinali wrote:
> > I'm using a 2.6.16.11 (an old snapshot from about last august,
> > when we started development of another board) that has slight
> > adjustments in various drivers to accomodate for our platform
> > specific stuff.
> 
> You may want to update anyway.  between the linux-2.6.16.11 tag of the
> linux-mips.org and the top of the linux-2.6.16-stable branch there are
> almost 58,000 lines of patch.  Even more if your compare the MIPS -stable
> branch to kernel.org's 2.6.16.11.  Iow a few metric buttloads.

That's already planned. But if first have to get past those
dead lines. After that i can look into makeing the whole
build system upgradeable and managable-

 
> > 0xffffffff80266134 <vsnprintf+76>:      beqz    v0,0x80266144 <vsnprintf+92>
> > 0xffffffff80266138 <vsnprintf+80>:      bltz    a0,0x802461c0 <jffs2_remount_fs+144>
> 
> A branch in the delay slot of another branch is forbidden by the MIPS
> architecture.  All processors I every tried this on missbehave in very
> unobvious ways when this is attempted.
> 
> You may want to compare that against your vmlinux file.  If the vmlinux
> binary also contains this bug, try building the affected source file with
> -S to find if the bug is cause by compiler or assembler.

It turned out to be a ground bounce problem. Interestingly the
"bug" was 100% reproducable, while normale ground problems are
totaly random. Thus i thought it has to be something in the
software.

> In single stepping mode your debugger probably executes branches by
> software emulation.  Chances are the emulation does something different
> for this illegal code sequence than actual hardware.

Oh.. nice to know. Thanks.
 
Thanks a lot for your answer and help,

			Attila Kinali
-- 
Praised are the Fountains of Shelieth, the silver harp of the waters,
But blest in my name forever this stream that stanched my thirst!
                         -- Deed of Morred

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-03-14  8:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-09 18:13 crash in first printk of start_kernel Attila Kinali
2007-03-13  1:02 ` Ralf Baechle
2007-03-14  8:04   ` Attila Kinali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox