* crash in first printk of start_kernel
@ 2007-03-09 18:13 Attila Kinali
2007-03-13 1:02 ` Ralf Baechle
0 siblings, 1 reply; 3+ messages in thread
From: Attila Kinali @ 2007-03-09 18:13 UTC (permalink / raw)
To: linux-mips
Hi,
I'm currently to bring up a new embedded system we build
using the Alchemy Au1550 CPU. The board itself looks like
a downsized version of the DB1550 evaluation board from AMD,
with 64MB DRAM and a 32MB NOR flash, plus PC-Card controller,
one ethernet port used.
I'm using a 2.6.16.11 (an old snapshot from about last august,
when we started development of another board) that has slight
adjustments in various drivers to accomodate for our platform
specific stuff.
While booting from u-boot works fine, ie it can load the kernel
from tftp, checksum is ok, uncompression is ok. It then jumps
into the kernel but ends up in an exception handler when executing
the first printk in start_kernel (ie the printk(KERN_NOTICE);
at init/main.c:451).
While in the "exception handler" endless loop from u-boot
ra points to vscnprintf() (lib/vsprintf.c:512)
where it calls vsnprintf() (at line 517).
If i set a breakpoint at lib/vsprintf.c:517 and follow it
into vsnprintf, i get the follwing gdb log:
---
Breakpoint 3, vscnprintf (buf=0x80411548 "", size=0x803b1fac,
fmt=0x8036f10c "<5>", args=0x803b1fac) at lib/vsprintf.c:517
517 i=vsnprintf(buf,size,fmt,args);
(gdb) step
vsnprintf (buf=0x80411548 "", size=0x400, fmt=0x8036f10c "<5>", args=0xff0000)
at lib/vsprintf.c:276
276 if (unlikely((int) size < 0)) {
(gdb) next
285 end = buf + size - 1;
(gdb) next
287 if (end < buf - 1) {
(gdb) p end
$1 = 0x80411947 ""
(gdb) p buf
$2 = 0x80411548 ""
---
if i go here further (either next/step or continue), then i end up
in the exception handler. So, it must be something in the asm
of this line:
---
(gdb) disassemble $pc $pc+100
Dump of assembler code from 0x8026612c to 0x80266190:
0xffffffff8026612c <vsnprintf+68>: addiu v0,a0,-1
0xffffffff80266130 <vsnprintf+72>: sltu v0,s2,v0
0xffffffff80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
0xffffffff80266138 <vsnprintf+80>: bltz a0,0x802461c0 <jffs2_remount_fs+144>
0xffffffff8026613c <vsnprintf+84>: li s2,-1
0xffffffff80266140 <vsnprintf+88>: negu s6,a0
0xffffffff80266144 <vsnprintf+92>: lb v0,0(a2)
0xffffffff80266148 <vsnprintf+96>: beqz v0,0x80266190 <vsnprintf+168>
0xffffffff8026614c <vsnprintf+100>: move a0,a2
0xffffffff80266150 <vsnprintf+104>: lb v1,0(a0)
0xffffffff80266154 <vsnprintf+108>: li v0,37
0xffffffff80266158 <vsnprintf+112>: beq v1,v0,0x802661dc <vsnprintf+244>
0xffffffff8026615c <vsnprintf+116>: lbu a0,4(a0)
0xffffffff80266160 <vsnprintf+120>: sltu v0,s2,s0
0xffffffff80266164 <vsnprintf+124>: bnez v0,0x80266174 <vsnprintf+140>
0xffffffff80266168 <vsnprintf+128>: sllv zero,zero,zero
0xffffffff8026616c <vsnprintf+132>: sb a0,0(s0)
0xffffffff80266170 <vsnprintf+136>: lw a2,80(sp)
0xffffffff80266174 <vsnprintf+140>: addiu s0,s0,1
0xffffffff80266178 <vsnprintf+144>: addiu v0,a2,1
0xffffffff8026617c <vsnprintf+148>: sw v0,80(sp)
0xffffffff80266180 <vsnprintf+152>: lb v1,0(v0)
0xffffffff80266184 <vsnprintf+156>: move a0,v0
0xffffffff80266188 <vsnprintf+160>: bnez v1,0x80266150 <vsnprintf+104>
0xffffffff8026618c <vsnprintf+164>: move a2,v0
End of assembler dump.
---
But if i use stepi from here on, then it looks like that the
BDI2000's breakpoint (both soft and hard) somehow interferes
with the execution of the code:
---
(gdb) display/i $pc
1: x/i $pc 0x8026612c <vsnprintf+68>: addiu v0,a0,-1
(gdb) stepi
0xffffffff80266130 287 if (end < buf - 1) {
1: x/i $pc 0x80266130 <vsnprintf+72>: sltu v0,s2,v0
(gdb) stepi
287 if (end < buf - 1) {
1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287 if (end < buf - 1) {
1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287 if (end < buf - 1) {
1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
(gdb) stepi
287 if (end < buf - 1) {
1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
(gdb) stepi
---
Yes, the next instruction (bltz a0,0x802461c0 <jffs2_remount_fs+144>)
looks fishy and should be most probably a NOP. If i patch this point
by hand and continue it crashes at another point.
The only two reasons why the instruction at this point could be
wrong, is either a gcc bug or a bug in u-boot's gunzip function.
And i somewhat doubt both. gcc is a 3.3.1 from Montavista and both
gcc and u-boot work on our other board which has a similar layout.
Has anyone an idea what the problem here could be or how i could
narrow it down?
Thanks in advance
Attila Kinali
--
Praised are the Fountains of Shelieth, the silver harp of the waters,
But blest in my name forever this stream that stanched my thirst!
-- Deed of Morred
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: crash in first printk of start_kernel
2007-03-09 18:13 crash in first printk of start_kernel Attila Kinali
@ 2007-03-13 1:02 ` Ralf Baechle
2007-03-14 8:04 ` Attila Kinali
0 siblings, 1 reply; 3+ messages in thread
From: Ralf Baechle @ 2007-03-13 1:02 UTC (permalink / raw)
To: Attila Kinali; +Cc: linux-mips
On Fri, Mar 09, 2007 at 07:13:54PM +0100, Attila Kinali wrote:
Grüzi Attila,
> I'm using a 2.6.16.11 (an old snapshot from about last august,
> when we started development of another board) that has slight
> adjustments in various drivers to accomodate for our platform
> specific stuff.
You may want to update anyway. between the linux-2.6.16.11 tag of the
linux-mips.org and the top of the linux-2.6.16-stable branch there are
almost 58,000 lines of patch. Even more if your compare the MIPS -stable
branch to kernel.org's 2.6.16.11. Iow a few metric buttloads.
> While booting from u-boot works fine, ie it can load the kernel
> from tftp, checksum is ok, uncompression is ok. It then jumps
> into the kernel but ends up in an exception handler when executing
> the first printk in start_kernel (ie the printk(KERN_NOTICE);
> at init/main.c:451).
>
> While in the "exception handler" endless loop from u-boot
> ra points to vscnprintf() (lib/vsprintf.c:512)
> where it calls vsnprintf() (at line 517).
>
> If i set a breakpoint at lib/vsprintf.c:517 and follow it
> into vsnprintf, i get the follwing gdb log:
>
> ---
> Breakpoint 3, vscnprintf (buf=0x80411548 "", size=0x803b1fac,
> fmt=0x8036f10c "<5>", args=0x803b1fac) at lib/vsprintf.c:517
> 517 i=vsnprintf(buf,size,fmt,args);
> (gdb) step
> vsnprintf (buf=0x80411548 "", size=0x400, fmt=0x8036f10c "<5>", args=0xff0000)
> at lib/vsprintf.c:276
> 276 if (unlikely((int) size < 0)) {
> (gdb) next
> 285 end = buf + size - 1;
> (gdb) next
> 287 if (end < buf - 1) {
> (gdb) p end
> $1 = 0x80411947 ""
> (gdb) p buf
> $2 = 0x80411548 ""
> ---
>
> if i go here further (either next/step or continue), then i end up
> in the exception handler. So, it must be something in the asm
> of this line:
>
> ---
> (gdb) disassemble $pc $pc+100
> Dump of assembler code from 0x8026612c to 0x80266190:
> 0xffffffff80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> 0xffffffff80266138 <vsnprintf+80>: bltz a0,0x802461c0 <jffs2_remount_fs+144>
A branch in the delay slot of another branch is forbidden by the MIPS
architecture. All processors I every tried this on missbehave in very
unobvious ways when this is attempted.
You may want to compare that against your vmlinux file. If the vmlinux
binary also contains this bug, try building the affected source file with
-S to find if the bug is cause by compiler or assembler.
> End of assembler dump.
> ---
>
> But if i use stepi from here on, then it looks like that the
> BDI2000's breakpoint (both soft and hard) somehow interferes
> with the execution of the code:
In single stepping mode your debugger probably executes branches by
software emulation. Chances are the emulation does something different
for this illegal code sequence than actual hardware.
> ---
> (gdb) display/i $pc
> 1: x/i $pc 0x8026612c <vsnprintf+68>: addiu v0,a0,-1
> (gdb) stepi
> 0xffffffff80266130 287 if (end < buf - 1) {
> 1: x/i $pc 0x80266130 <vsnprintf+72>: sltu v0,s2,v0
> (gdb) stepi
> 287 if (end < buf - 1) {
> 1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287 if (end < buf - 1) {
> 1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287 if (end < buf - 1) {
> 1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> 287 if (end < buf - 1) {
> 1: x/i $pc 0x80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> (gdb) stepi
> ---
>
> Yes, the next instruction (bltz a0,0x802461c0 <jffs2_remount_fs+144>)
> looks fishy and should be most probably a NOP. If i patch this point
> by hand and continue it crashes at another point.
>
> The only two reasons why the instruction at this point could be
> wrong, is either a gcc bug or a bug in u-boot's gunzip function.
> And i somewhat doubt both. gcc is a 3.3.1 from Montavista and both
> gcc and u-boot work on our other board which has a similar layout.
3.3.1 is _dangerously_ old. At the very least update to 3.3.6 - or even
4.1.2. A vanilla source gcc tree from ftp.gnu.org is fine to build the
kernel.
Ralf
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: crash in first printk of start_kernel
2007-03-13 1:02 ` Ralf Baechle
@ 2007-03-14 8:04 ` Attila Kinali
0 siblings, 0 replies; 3+ messages in thread
From: Attila Kinali @ 2007-03-14 8:04 UTC (permalink / raw)
To: Ralf Baechle; +Cc: linux-mips
Hoi Ralf,
On Tue, 13 Mar 2007 01:02:52 +0000
Ralf Baechle <ralf@linux-mips.org> wrote:
> On Fri, Mar 09, 2007 at 07:13:54PM +0100, Attila Kinali wrote:
> > I'm using a 2.6.16.11 (an old snapshot from about last august,
> > when we started development of another board) that has slight
> > adjustments in various drivers to accomodate for our platform
> > specific stuff.
>
> You may want to update anyway. between the linux-2.6.16.11 tag of the
> linux-mips.org and the top of the linux-2.6.16-stable branch there are
> almost 58,000 lines of patch. Even more if your compare the MIPS -stable
> branch to kernel.org's 2.6.16.11. Iow a few metric buttloads.
That's already planned. But if first have to get past those
dead lines. After that i can look into makeing the whole
build system upgradeable and managable-
> > 0xffffffff80266134 <vsnprintf+76>: beqz v0,0x80266144 <vsnprintf+92>
> > 0xffffffff80266138 <vsnprintf+80>: bltz a0,0x802461c0 <jffs2_remount_fs+144>
>
> A branch in the delay slot of another branch is forbidden by the MIPS
> architecture. All processors I every tried this on missbehave in very
> unobvious ways when this is attempted.
>
> You may want to compare that against your vmlinux file. If the vmlinux
> binary also contains this bug, try building the affected source file with
> -S to find if the bug is cause by compiler or assembler.
It turned out to be a ground bounce problem. Interestingly the
"bug" was 100% reproducable, while normale ground problems are
totaly random. Thus i thought it has to be something in the
software.
> In single stepping mode your debugger probably executes branches by
> software emulation. Chances are the emulation does something different
> for this illegal code sequence than actual hardware.
Oh.. nice to know. Thanks.
Thanks a lot for your answer and help,
Attila Kinali
--
Praised are the Fountains of Shelieth, the silver harp of the waters,
But blest in my name forever this stream that stanched my thirst!
-- Deed of Morred
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-03-14 8:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-09 18:13 crash in first printk of start_kernel Attila Kinali
2007-03-13 1:02 ` Ralf Baechle
2007-03-14 8:04 ` Attila Kinali
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox