Sandpoint & random crashes?

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Sandpoint & random crashes?
@ 2000-09-10 19:49 Alex Shnitman
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Shnitman @ 2000-09-10 19:49 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

I can't seem to find the problem with the init not working on the
Sandpoint. Here are some more details.

In a nutshell, when the kernel gets to the point of running init, it
gets stuck. The console works, and it responds to pings, but it
doesn't proceed running init. Usually.

Once, it did run init for me (that actually was sash). I didn't do
anything different that time so I don't know how that happened. In any
case, it crashed after a minute of work, when I did the following:

bash-2.03# cat tty/driver
NIP: C0032854 XER: 20000000 LR: C00327E0 REGS: c1e23d60 TRAP: 0600
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c1e22000[41] 'cat' Last syscall: 5
last math c1e22000 last altivec 00000000
GPR00: BD3D2722 C1E23E10 C1E22000 C013D760 00000000 00000000 C013D7BC 0C44D0B7
GPR08: C0140000 BD3D2712 00000000 00000000 84262422 0184A4DC 00000000 00000000
GPR16: 00000001 7FFFFE54 7FFFFD30 00000002 00009032 01E23E80 00000000 C0004F1C
GPR24: C0004C6C 00000000 00001000 FFFFFFE9 C1E69B80 C02EBDE0 C1DFE400 C013D760
Call backtrace:
C00327E0 C00327AC C0032BB8 C0004CC0 01800724 018014AC 0170C75C
00000000
Kernel panic: kernel access of bad area pc c0032854 lr c00327e0 address BD3D2721
Rebooting in 180 seconds..

Another time, it died right after (or during) mounting the NFS filesystem:

NIP: C005F72C XER: 20000000 LR: C00CA164 REGS: c02e3d00 TRAP: 0300
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c02e2000[5] 'rpciod' Last syscall: 36
last math 00000000 last altivec 00000000
GPR00: C00CA164 C02E3DB0 C02E2000 00000001 C02F7A6C C0136F48 C01141C0 00000000
GPR08: 00000000 00000000 C0177320 00000000 0D860800 00000000 00000000 00000000
GPR16: C02E81C0 C02E814C 00000000 00000000 C02F7A68 C0100000 C0140000 C01023E0
GPR24: C0100000 C01145F0 C0110000 C0110000 C02E814C C005F6E0 C0136F48 00000000
Call backtrace:
C02E3E60 C00CA164 C00CD87C C00CDE88 C00CE96C C00094FC
Kernel panic: kernel access of bad area pc c005f72c lr c00ca164 address 20 tsk 5
Rebooting in 180 seconds..

I think there were a few more crashes like this, during different
stages of trying to run init.. Usually in different places, but always
with the message "kernel access of bad area". I don't have the
backtraces here, but they were prefectly normal, and since crashes
always happened to me at different places, I suppose it's not very
important.

What does this "bad area" thing mean? How do we get rid of it?

--
Alex Shnitman                            | http://www.debian.org
alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
http://alexsh.hectic.net    UIN 188956    PGP key on web page
       E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Sandpoint & random crashes?
@ 2000-09-11  1:32 ZHANG,HAI-TAO (Non-A-China,ex1)
  2000-09-11 20:35 ` Mark A. Greer
  0 siblings, 1 reply; 11+ messages in thread
From: ZHANG,HAI-TAO (Non-A-China,ex1) @ 2000-09-11  1:32 UTC (permalink / raw)
  To: Alex Shnitman, linuxppc-embedded


Hi,

I met the same problem while using the 2.4 kernel from MontaVista Area51.
The kernel just crashes when executing init.

Here is the oops:
NIP: 00000300 XER: 20000000 LR: C004DD7C REGS: c01fda80 TRAP: 0700
MSR: 00081000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
TASK = c01fc000[1] 'init' Last syscall: 11
last math 00000000 last altivec 00000000
GPR00: 00000000 C01FDB30 C01FC000 00000000 00000CD8 00000000 30026324
00000000
GPR08: C0100000 C00D0000 C00F0000 C01FDA70 24848024 00000000 10017218
10016F10
GPR16: 10000000 C01FDC4C 00000060 00000001 C1FC1640 00000003 C1FBC160
300268F4
GPR24: 30000000 30026328 000236E0 C01FDBD8 00000812 30023000 30026328
00000CD8
Call backtrace:
C004DD40 C004E3F0 C004EC84 C003F4B0 C003F720 C0006B88 C0004880
C0003974 C0008FE0
Kernel panic: Exception in kernel pc 300 signal 4

And I use the ksymoops to see where it crashes:
>>NIP; 00000300 Before first symbol   <=====
Trace; c004dd40 <padzero+3c/a0>
Trace; c004e3f0 <load_elf_interp+29c/2f4>
Trace; c004ec84 <load_elf_binary+6e8/950>
Trace; c003f4b0 <search_binary_handler+5c/160>
Trace; c003f720 <do_execve+16c/1fc>
Trace; c0006b88 <sys_execve+5c/dc>
Trace; c0004880 <ret_from_syscall_1+0/a0>
Trace; c0003974 <init+18/1a8>
Trace; c0008fe0 <kernel_thread+2c/38>

It is in the fs/binfmt_elf.c:
static void padzero(unsigned long elf_bss)
{
	unsigned long nbyte;

	nbyte = ELF_PAGEOFFSET(elf_bss);
	if (nbyte) {  <----------------- the kernel crashes here with
instruction "cmpwi r31,0"
		nbyte = ELF_EXEC_PAGESIZE - nbyte;
		clear_user((void *) elf_bss, nbyte);
	}
}

But I cannot figure out what is wrong with that line.

Any suggestios?

Thanks,
Haitao Zhang



> -----Original Message-----
> From: Alex Shnitman [mailto:alexsh@hectic.net]
> Sent: Monday, September 11, 2000 3:50 AM
> To: linuxppc-embedded@lists.linuxppc.org
> Subject: Sandpoint & random crashes?
>
>
>
> Hi,
>
> I can't seem to find the problem with the init not working on the
> Sandpoint. Here are some more details.
>
> In a nutshell, when the kernel gets to the point of running init, it
> gets stuck. The console works, and it responds to pings, but it
> doesn't proceed running init. Usually.
>
> Once, it did run init for me (that actually was sash). I didn't do
> anything different that time so I don't know how that happened. In any
> case, it crashed after a minute of work, when I did the following:
>
> bash-2.03# cat tty/driver
> NIP: C0032854 XER: 20000000 LR: C00327E0 REGS: c1e23d60 TRAP: 0600
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c1e22000[41] 'cat' Last syscall: 5
> last math c1e22000 last altivec 00000000
> GPR00: BD3D2722 C1E23E10 C1E22000 C013D760 00000000 00000000
> C013D7BC 0C44D0B7
> GPR08: C0140000 BD3D2712 00000000 00000000 84262422 0184A4DC
> 00000000 00000000
> GPR16: 00000001 7FFFFE54 7FFFFD30 00000002 00009032 01E23E80
> 00000000 C0004F1C
> GPR24: C0004C6C 00000000 00001000 FFFFFFE9 C1E69B80 C02EBDE0
> C1DFE400 C013D760
> Call backtrace:
> C00327E0 C00327AC C0032BB8 C0004CC0 01800724 018014AC 0170C75C
> 00000000
> Kernel panic: kernel access of bad area pc c0032854 lr
> c00327e0 address BD3D2721
> Rebooting in 180 seconds..
>
> Another time, it died right after (or during) mounting the
> NFS filesystem:
>
> NIP: C005F72C XER: 20000000 LR: C00CA164 REGS: c02e3d00 TRAP: 0300
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c02e2000[5] 'rpciod' Last syscall: 36
> last math 00000000 last altivec 00000000
> GPR00: C00CA164 C02E3DB0 C02E2000 00000001 C02F7A6C C0136F48
> C01141C0 00000000
> GPR08: 00000000 00000000 C0177320 00000000 0D860800 00000000
> 00000000 00000000
> GPR16: C02E81C0 C02E814C 00000000 00000000 C02F7A68 C0100000
> C0140000 C01023E0
> GPR24: C0100000 C01145F0 C0110000 C0110000 C02E814C C005F6E0
> C0136F48 00000000
> Call backtrace:
> C02E3E60 C00CA164 C00CD87C C00CDE88 C00CE96C C00094FC
> Kernel panic: kernel access of bad area pc c005f72c lr
> c00ca164 address 20 tsk 5
> Rebooting in 180 seconds..
>
> I think there were a few more crashes like this, during different
> stages of trying to run init.. Usually in different places, but always
> with the message "kernel access of bad area". I don't have the
> backtraces here, but they were prefectly normal, and since crashes
> always happened to me at different places, I suppose it's not very
> important.
>
> What does this "bad area" thing mean? How do we get rid of it?
>
>
> --
> Alex Shnitman                            | http://www.debian.org
> alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
> http://alexsh.hectic.net    UIN 188956    PGP key on web page
>        E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA
>

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-11  1:32 ZHANG,HAI-TAO (Non-A-China,ex1)
@ 2000-09-11 20:35 ` Mark A. Greer
  2000-09-11 21:12   ` Alex Shnitman
  0 siblings, 1 reply; 11+ messages in thread
From: Mark A. Greer @ 2000-09-11 20:35 UTC (permalink / raw)
  To: ZHANG,HAI-TAO (Non-A-China,ex1); +Cc: Alex Shnitman, linuxppc-embedded


Guys, I'm trying to recreate what you're seeing but I can't.  Definitely seems
like something is amiss.  The NIP: 00000300 one is definitely interesting.

I'm on an 8240 and i can't make it crash.  Correct me if I'm wrong, Alex
you're using a 7400 with a 107; Hai-Tao (is that correct?) you're using a 750
with a 107, right??  Any other info on your systems/processor boards that may
be useful for me to know?

It sounds like this is happening during boot up.  Is that correct?  If you get
up, what sequence of cmds cause it to fail?  I need help recreating it here.

You're both using the toolchain and root filesystem from the MontaVista 1.2
CDK, right?  I use the root filesystem in
/opt/hardhat/devkit/ppc/82xx/target.  What are you using?

I boot fine over NFS and with the above root filesystem on an IDE drive.  Are
both of you using NFS?  If so, make sure that you have the "IP: BOOTP support"
selected under the "Networking options" menu item and the "Root file system on
NFS" on under "File systems/Network File Systems" menu item.


Thanks,

Mark
--


"ZHANG,HAI-TAO (Non-A-China,ex1)" wrote:

> Hi,
>
> I met the same problem while using the 2.4 kernel from MontaVista Area51.
> The kernel just crashes when executing init.
>
> Here is the oops:
> NIP: 00000300 XER: 20000000 LR: C004DD7C REGS: c01fda80 TRAP: 0700
> MSR: 00081000 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 00
> TASK = c01fc000[1] 'init' Last syscall: 11
> last math 00000000 last altivec 00000000
> GPR00: 00000000 C01FDB30 C01FC000 00000000 00000CD8 00000000 30026324
> 00000000
> GPR08: C0100000 C00D0000 C00F0000 C01FDA70 24848024 00000000 10017218
> 10016F10
> GPR16: 10000000 C01FDC4C 00000060 00000001 C1FC1640 00000003 C1FBC160
> 300268F4
> GPR24: 30000000 30026328 000236E0 C01FDBD8 00000812 30023000 30026328
> 00000CD8
> Call backtrace:
> C004DD40 C004E3F0 C004EC84 C003F4B0 C003F720 C0006B88 C0004880
> C0003974 C0008FE0
> Kernel panic: Exception in kernel pc 300 signal 4
>
> And I use the ksymoops to see where it crashes:
> >>NIP; 00000300 Before first symbol   <=====
> Trace; c004dd40 <padzero+3c/a0>
> Trace; c004e3f0 <load_elf_interp+29c/2f4>
> Trace; c004ec84 <load_elf_binary+6e8/950>
> Trace; c003f4b0 <search_binary_handler+5c/160>
> Trace; c003f720 <do_execve+16c/1fc>
> Trace; c0006b88 <sys_execve+5c/dc>
> Trace; c0004880 <ret_from_syscall_1+0/a0>
> Trace; c0003974 <init+18/1a8>
> Trace; c0008fe0 <kernel_thread+2c/38>
>
> It is in the fs/binfmt_elf.c:
> static void padzero(unsigned long elf_bss)
> {
>         unsigned long nbyte;
>
>         nbyte = ELF_PAGEOFFSET(elf_bss);
>         if (nbyte) {  <----------------- the kernel crashes here with
> instruction "cmpwi r31,0"
>                 nbyte = ELF_EXEC_PAGESIZE - nbyte;
>                 clear_user((void *) elf_bss, nbyte);
>         }
> }
>
> But I cannot figure out what is wrong with that line.
>
> Any suggestios?
>
> Thanks,
> Haitao Zhang
>
> > -----Original Message-----
> > From: Alex Shnitman [mailto:alexsh@hectic.net]
> > Sent: Monday, September 11, 2000 3:50 AM
> > To: linuxppc-embedded@lists.linuxppc.org
> > Subject: Sandpoint & random crashes?
> >
> >
> >
> > Hi,
> >
> > I can't seem to find the problem with the init not working on the
> > Sandpoint. Here are some more details.
> >
> > In a nutshell, when the kernel gets to the point of running init, it
> > gets stuck. The console works, and it responds to pings, but it
> > doesn't proceed running init. Usually.
> >
> > Once, it did run init for me (that actually was sash). I didn't do
> > anything different that time so I don't know how that happened. In any
> > case, it crashed after a minute of work, when I did the following:
> >
> > bash-2.03# cat tty/driver
> > NIP: C0032854 XER: 20000000 LR: C00327E0 REGS: c1e23d60 TRAP: 0600
> > MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > TASK = c1e22000[41] 'cat' Last syscall: 5
> > last math c1e22000 last altivec 00000000
> > GPR00: BD3D2722 C1E23E10 C1E22000 C013D760 00000000 00000000
> > C013D7BC 0C44D0B7
> > GPR08: C0140000 BD3D2712 00000000 00000000 84262422 0184A4DC
> > 00000000 00000000
> > GPR16: 00000001 7FFFFE54 7FFFFD30 00000002 00009032 01E23E80
> > 00000000 C0004F1C
> > GPR24: C0004C6C 00000000 00001000 FFFFFFE9 C1E69B80 C02EBDE0
> > C1DFE400 C013D760
> > Call backtrace:
> > C00327E0 C00327AC C0032BB8 C0004CC0 01800724 018014AC 0170C75C
> > 00000000
> > Kernel panic: kernel access of bad area pc c0032854 lr
> > c00327e0 address BD3D2721
> > Rebooting in 180 seconds..
> >
> > Another time, it died right after (or during) mounting the
> > NFS filesystem:
> >
> > NIP: C005F72C XER: 20000000 LR: C00CA164 REGS: c02e3d00 TRAP: 0300
> > MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > TASK = c02e2000[5] 'rpciod' Last syscall: 36
> > last math 00000000 last altivec 00000000
> > GPR00: C00CA164 C02E3DB0 C02E2000 00000001 C02F7A6C C0136F48
> > C01141C0 00000000
> > GPR08: 00000000 00000000 C0177320 00000000 0D860800 00000000
> > 00000000 00000000
> > GPR16: C02E81C0 C02E814C 00000000 00000000 C02F7A68 C0100000
> > C0140000 C01023E0
> > GPR24: C0100000 C01145F0 C0110000 C0110000 C02E814C C005F6E0
> > C0136F48 00000000
> > Call backtrace:
> > C02E3E60 C00CA164 C00CD87C C00CDE88 C00CE96C C00094FC
> > Kernel panic: kernel access of bad area pc c005f72c lr
> > c00ca164 address 20 tsk 5
> > Rebooting in 180 seconds..
> >
> > I think there were a few more crashes like this, during different
> > stages of trying to run init.. Usually in different places, but always
> > with the message "kernel access of bad area". I don't have the
> > backtraces here, but they were prefectly normal, and since crashes
> > always happened to me at different places, I suppose it's not very
> > important.
> >
> > What does this "bad area" thing mean? How do we get rid of it?
> >
> >
> > --
> > Alex Shnitman                            | http://www.debian.org
> > alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
> > http://alexsh.hectic.net    UIN 188956    PGP key on web page
> >        E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA
> >
>

--
Mark A. Greer (mgreer@mvista.com; 480-517-0287)
MontaVista Software, Inc.
2141 E. Broadway Road, Suite 108
Tempe, AZ  85282


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-11 20:35 ` Mark A. Greer
@ 2000-09-11 21:12   ` Alex Shnitman
  2000-09-11 23:11     ` Mark A. Greer
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Shnitman @ 2000-09-11 21:12 UTC (permalink / raw)
  To: linuxppc-embedded

Hi, Mark!

On Mon, Sep 11, 2000 at 01:35:33PM -0700, you wrote the following:

> Guys, I'm trying to recreate what you're seeing but I can't.  Definitely seems
> like something is amiss.  The NIP: 00000300 one is definitely interesting.

I always get 'kernel access of bad area' when it crashes, each time
with a different address. (They don't seem to be in any particular
range.) That's what I get if it boots, after working for some
time. Usually it doesn't -- it gets stuck when running init, as I
wrote in a previous message.

Today I think I noticed a very interesting consistency that might be
helpful. I haven't had the time to test it completely; I'll do it
tomorrow and post again. The thing is, there's a little green led on
the board saying "backup power" or something like that. If you turn
off the computer and the power supply, and leave it off for half a
minute or so, the led turns off. If you turn the computer on
afterwards and load the kernel, it loads init and you can work (until
it crashes). If you just reset the computer and load the kernel (after
uploading it via dink of course), init won't load.

I'll verify this finally tomorrow, but if you have any ideas off the
top of your head now, it'll be most helpful.

> I'm on an 8240 and i can't make it crash.  Correct me if I'm wrong, Alex
> you're using a 7400 with a 107; Hai-Tao (is that correct?) you're using a 750
> with a 107, right??  Any other info on your systems/processor boards that may
> be useful for me to know?

That's what I'm using, indeed. Nothing else to say.. The system is as
it was shipped by Motorola, with just the board switches changed as
you posted some time ago.

> It sounds like this is happening during boot up.  Is that correct?  If you get
> up, what sequence of cmds cause it to fail?  I need help recreating it here.

If I do get the kernel to run init (I always use init=/bin/sash), it
crashes after some time that I work, with no apparent consistency. I
tend to poke through /proc after I get it to boot (lots of interesting
stuff there ;-) so often it crashes when I cat one of the files. (But
cattign the same files work at other times.) At one time it crashed
when I entered a non-existent command by mistake. :-) There's some
corruption going on, it really looks to me like it doesn't have
anything to do with my activity on the box at the time.

> You're both using the toolchain and root filesystem from the MontaVista 1.2
> CDK, right?  I use the root filesystem in
> /opt/hardhat/devkit/ppc/82xx/target.  What are you using?

Truth is, I've been a bad boy -- I'm using the emdebian
cross-compilers and not CDK. (They're available as debs so they
integrate better here, that's the only reason I chose them.) It's gcc
2.95.2. I suppose I should try the CDK compilers.. Although I doubt
very much that it would make a difference.

As for the filesystem, I have a lot of stuf there, but basically I'm
using sash from CDK to test it. I also have libc and bash and such
stuff there, which I took from Debian's base.tgz for PowerPC, and
usually if sash loads I type "bash", which works.

I guess I should really try the CDK environment alone at some
point.. However, the question of init not loading still remains.

> I boot fine over NFS and with the above root filesystem on an IDE drive.  Are
> both of you using NFS?  If so, make sure that you have the "IP: BOOTP support"
> selected under the "Networking options" menu item and the "Root file system on
> NFS" on under "File systems/Network File Systems" menu item.

I'm using NFS. Obviously these two are selected, since the machine
gets its IP address and successfully mounts the filesystem and
occasionally even loads a shell for me. :-)

Thanks a lot for your efforts!

--
Alex Shnitman                            | http://www.debian.org
alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
http://alexsh.hectic.net    UIN 188956    PGP key on web page
       E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA

Wear short sleeves! Support your right to bare arms!

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-11 21:12   ` Alex Shnitman
@ 2000-09-11 23:11     ` Mark A. Greer
  2000-09-11 23:53       ` Ron Bianco
  2000-09-18 14:13       ` Alex Shnitman
  0 siblings, 2 replies; 11+ messages in thread
From: Mark A. Greer @ 2000-09-11 23:11 UTC (permalink / raw)
  To: Alex Shnitman; +Cc: linuxppc-embedded

Alex Shnitman wrote:

> Hi, Mark!
> Today I think I noticed a very interesting consistency that might be
> helpful. I haven't had the time to test it completely; I'll do it
> tomorrow and post again. The thing is, there's a little green led on
> the board saying "backup power" or something like that. If you turn
> off the computer and the power supply, and leave it off for half a
> minute or so, the led turns off. If you turn the computer on
> afterwards and load the kernel, it loads init and you can work (until
> it crashes). If you just reset the computer and load the kernel (after
> uploading it via dink of course), init won't load.
>

This almost sounds like a hardware problem.  How old is your processor module?
Remember this is a test platform for MOT SPS where they test out new processors,
etc.  They may have given you an early rev board or processor or host bridge or...
If you have an old one, you may want to ask for a newer one.

I perused through /proc cat-ing files as well.  Haven't had any problems.  I'll
keep an eye out.   Please let me/us know what you discover in the meantime.

Thanks,

Mark

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Sandpoint & random crashes?
  2000-09-11 23:11     ` Mark A. Greer
@ 2000-09-11 23:53       ` Ron Bianco
  2000-09-12  6:57         ` Alex Shnitman
  2000-09-18 14:13       ` Alex Shnitman
  1 sibling, 1 reply; 11+ messages in thread
From: Ron Bianco @ 2000-09-11 23:53 UTC (permalink / raw)
  To: linuxppc-embedded


Another possible hardware related problem might be that the Dink monitor is not
programming the SDRAM controller with the right parameters for that DIMM module ( I assume
the sandpoint uses DIMMs ).
I'm not sure if the monitor reads the param data from the module's eeprom or is manually
configured and stored in CMOS ram.

Anyway, have you run any memory tests?

My 2 peso's worth, good luck.

Ron

> -----Original Message-----
> From: owner-linuxppc-embedded@lists.linuxppc.org
> [mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Mark A.
> Greer
> Sent: Monday, September 11, 2000 4:12 PM
> To: Alex Shnitman
> Cc: linuxppc-embedded@lists.linuxppc.org
> Subject: Re: Sandpoint & random crashes?
>
>
>
> Alex Shnitman wrote:
>
> > Hi, Mark!
> > Today I think I noticed a very interesting consistency that might be
> > helpful. I haven't had the time to test it completely; I'll do it
> > tomorrow and post again. The thing is, there's a little green led on
> > the board saying "backup power" or something like that. If you turn
> > off the computer and the power supply, and leave it off for half a
> > minute or so, the led turns off. If you turn the computer on
> > afterwards and load the kernel, it loads init and you can work (until
> > it crashes). If you just reset the computer and load the kernel (after
> > uploading it via dink of course), init won't load.
> >
>
> This almost sounds like a hardware problem.  How old is your processor module?
> Remember this is a test platform for MOT SPS where they test out new processors,
> etc.  They may have given you an early rev board or processor or host bridge or...
> If you have an old one, you may want to ask for a newer one.
>
> I perused through /proc cat-ing files as well.  Haven't had any problems.  I'll
> keep an eye out.   Please let me/us know what you discover in the meantime.
>
> Thanks,
>
> Mark
>
>
>


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Sandpoint & random crashes?
@ 2000-09-12  1:25 ZHANG,HAI-TAO (Non-A-China,ex1)
  2000-09-12 17:44 ` Mark A. Greer
  0 siblings, 1 reply; 11+ messages in thread
From: ZHANG,HAI-TAO (Non-A-China,ex1) @ 2000-09-12  1:25 UTC (permalink / raw)
  To: mgreer; +Cc: Alex Shnitman, linuxppc-embedded

Hi,

I am using the 603e + 107 sandpoint board, and use the nfsroot from
MontaVista CDK1.2. I have boot the kernel succesfully with the 2.3.16 kernel
from MontaVista Area51, added the IDE part for your 2.4 version, it is ok
for NFS boot. So I am not sure what is the difference between those two
versions, and I guess that there are some errors in 2.4 which crash the
kernel.

Thanks,
Haitao,

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-11 23:53       ` Ron Bianco
@ 2000-09-12  6:57         ` Alex Shnitman
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Shnitman @ 2000-09-12  6:57 UTC (permalink / raw)
  To: linuxppc-embedded


Hi, Ron!

On Mon, Sep 11, 2000 at 04:53:00PM -0700, you wrote the following:

> Another possible hardware related problem might be that the Dink
> monitor is not programming the SDRAM controller with the right
> parameters for that DIMM module ( I assume the sandpoint uses DIMMs
> ).  I'm not sure if the monitor reads the param data from the
> module's eeprom or is manually configured and stored in CMOS ram.

It's a SIMM module. Does that make any difference?
(You can see it at:
http://www.mot.com/SPS/PowerPC/teksupport/refdesigns/sandpoint.html#7400 )

> Anyway, have you run any memory tests?

No, how do I do that? What memory test applications are there for this
platform?


--
Alex Shnitman                            | http://www.debian.org
alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
http://alexsh.hectic.net    UIN 188956    PGP key on web page
       E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA

/real/ kernel hackers
    dd if=/dev/urandom of=/vmlinuz
and influence the Universal Randomosity Field.
	-- Gaal Yahas

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-12  1:25 Sandpoint & random crashes? ZHANG,HAI-TAO (Non-A-China,ex1)
@ 2000-09-12 17:44 ` Mark A. Greer
  0 siblings, 0 replies; 11+ messages in thread
From: Mark A. Greer @ 2000-09-12 17:44 UTC (permalink / raw)
  To: ZHANG,HAI-TAO (Non-A-China,ex1); +Cc: Alex Shnitman, linuxppc-embedded

This isn't a total surprise.  I've had problems with this version of the
kernel.  Most seem to be timing problems.  I managed to get things stable on
the processor/host-bridge combo's that I use--I just ran some test on a 755/107
overnight and its still working fine.  Evidently, and not surprisingly, your
timings are different.

Probably the real fix is to get the latest ppc tree stable, then move the
sandpoint stuff forward to that.

Mark
--

"ZHANG,HAI-TAO (Non-A-China,ex1)" wrote:

> Hi,
>
> I am using the 603e + 107 sandpoint board, and use the nfsroot from
> MontaVista CDK1.2. I have boot the kernel succesfully with the 2.3.16 kernel
> from MontaVista Area51, added the IDE part for your 2.4 version, it is ok
> for NFS boot. So I am not sure what is the difference between those two
> versions, and I guess that there are some errors in 2.4 which crash the
> kernel.
>
> Thanks,
> Haitao,
>

--
Mark A. Greer (mgreer@mvista.com; 480-517-0287)
MontaVista Software, Inc.
2141 E. Broadway Road, Suite 108
Tempe, AZ  85282

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-11 23:11     ` Mark A. Greer
  2000-09-11 23:53       ` Ron Bianco
@ 2000-09-18 14:13       ` Alex Shnitman
  2000-09-18 17:42         ` Mark A. Greer
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Shnitman @ 2000-09-18 14:13 UTC (permalink / raw)
  To: Mark A. Greer; +Cc: linuxppc-embedded


Hi, Mark!

On Mon, Sep 11, 2000 at 04:11:53PM -0700, you wrote the following:

> > Today I think I noticed a very interesting consistency that might be
> > helpful. I haven't had the time to test it completely; I'll do it
> > tomorrow and post again. The thing is, there's a little green led on
> > the board saying "backup power" or something like that. If you turn
> > off the computer and the power supply, and leave it off for half a
> > minute or so, the led turns off. If you turn the computer on
> > afterwards and load the kernel, it loads init and you can work (until
> > it crashes). If you just reset the computer and load the kernel (after
> > uploading it via dink of course), init won't load.
> >
>
> This almost sounds like a hardware problem.  How old is your
> processor module?  Remember this is a test platform for MOT SPS
> where they test out new processors, etc.  They may have given you an
> early rev board or processor or host bridge or...  If you have an
> old one, you may want to ask for a newer one.

We've just bought those boards now from Motorola. I checked on their
site and we have the latest revisions...

As to hardware problems, I took the other box we have here (an
identical configuration) and tested there, with the same results. :-(
So if it's a hardware problem, it's in all those boards. I also ran
the memory test that dink has on all the memory that I can (from 90000
to the end; before that resides dink itself) and it didn't find any
errors. (I ran all the six or seven tests, 19 times in a row -- about
19-20 hours of testing.) So unless I'm extremely unlucky (and there
are problems in the low range), the memory isn't a problem either.

I downloaded the compilers from CDK 1.2 and compiled a kernel with
them. Made no difference.

Here are some more crash dumps, FWIW. Something is definitely fishy in
regard to memory management.

This one is weird -- I don't have any swap.

> mount-t^H ^H^H^H

sh: mount-: coBad swap file entry 00000085
kernel BUG at swap_state.c:71!

NIP: C002F0D4 XER: 20000000 LR: C002F0D4 REGS: c0283ce0 TRAP: 0700
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0282000[7] 'sh' Last syscall: 1
last math c0282000 last altivec 00000000
GPR00: C002F0D4 C0283D90 C0282000 0000001F 00001032 C0105734 C0140000 00000000
GPR08: C0140000 C0100000 C0110000 C0283CD0 44262824 100A09F8 00000000 100A6190
GPR16: 00000000 00000000 00000000 00000000 C01310E0 C0284100 1024D000 0FF2E000
GPR24: 00000000 00000000 0032E000 00000041 C03CDB4C C0284100 00000085 C01AF6B8
Call backtrace:
C002F0D4 C002F1EC C002F328 C00208B8 C002393C C0013F84 C0016FB0
C00171F4 C0004CC0 0FE79968 1002190C 10020DC8 1001D75C 1004D09C
10010B08 1000FBC4 0FE6F75C 00000000
Kernel panic: Exception in kernel pc c002f0d4 signal 4

backtrace:
0xc002f0d4 -- 0xc002f080 + 0x0054   __delete_from_swap_cache
0xc002f1ec -- 0xc002f14c + 0x00a0   delete_from_swap_cache_nolock
0xc002f328 -- 0xc002f280 + 0x00a8   free_page_and_swap_cache
0xc00208b8 -- 0xc0020730 + 0x0188   zap_page_range
0xc002393c -- 0xc0023838 + 0x0104   exit_mmap
0xc0013f84 -- 0xc0013f4c + 0x0038   mmput
0xc0016fb0 -- 0xc0016ed0 + 0x00e0   do_exit
0xc00171f4 -- 0xc00171f4 + 0x0000   sys_wait4
0xc0004cc0 -- 0xc0004cc0 + 0x0000   ret_from_syscall_1


And this one is crazy -- 14,500,000 worked fine, 15,000,000 gave me
"Out of memory", and the middle between them gave me this:

bash-2.03# perl -e '$a="A"x14750000'
kmem_free: Bad obj addr (objp=c0177500, name=size-64)
kernel BUG at slab.c:1695!
NIP: C002CDD4 XER: 20000000 LR: C002CDD4 REGS: c0104cd0 TRAP: 0700
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0103000[0] 'swapper' Last syscall: 36
last math 00000000 last altivec 00000000
GPR00: C002CDD4 C0104D80 C0103000 0000001B 00001032 C0105734 C0140000 00000000
GPR08: C0140000 C0100000 C0110000 C0104CC0 24462024 100A09F8 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00104EB0 C02F7042 C02FFA40
GPR24: C0177500 00000014 0000001C C01023E0 C017755C C0177FE0 C0177500 C01A0160
Call backtrace:
C002CDD4 C009E1E4 C009EA68 C009DA98 C009DF70 C0093E4C C00189EC
C0004F60 00000000 C0006130 C0006144 C011678C 00003C60
Kernel panic: Exception in kernel pc c002cdd4 signal 4
In interrupt handler - not syncing
Rebooting in 180 seconds..

backtrace:
0xc002cdd4 -- 0xc002ca04 + 0x03d0   kfree
0xc009e1e4 -- 0xc009e0e4 + 0x0100   ip_free
0xc009ea68 -- 0xc009e740 + 0x0328   ip_defrag
0xc009da98 -- 0xc009da70 + 0x0028   ip_local_deliver
0xc009df70 -- 0xc009dc44 + 0x032c   ip_rcv
0xc0093e4c -- 0xc0093c48 + 0x0204   net_rx_action
0xc00189ec -- 0xc0018934 + 0x00b8   do_softirq
0xc0004f60 -- 0xc0004f60 + 0x0000   do_bottom_half_ret
0x00000000 -- unknown address
0xc0006130 -- 0xc00060c0 + 0x0070   idled
0xc0006144 -- 0xc0006134 + 0x0010   cpu_idle
0xc011678c -- 0xc0116644 + 0x0148   start_kernel
0x00003c60 -- unknown address



--
Alex Shnitman                            | http://www.debian.org
alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
http://alexsh.hectic.net    UIN 188956    PGP key on web page
       E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA

/real/ kernel hackers
    dd if=/dev/urandom of=/vmlinuz
and influence the Universal Randomosity Field.
	-- Gaal Yahas

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Sandpoint & random crashes?
  2000-09-18 14:13       ` Alex Shnitman
@ 2000-09-18 17:42         ` Mark A. Greer
  0 siblings, 0 replies; 11+ messages in thread
From: Mark A. Greer @ 2000-09-18 17:42 UTC (permalink / raw)
  To: Alex Shnitman; +Cc: linuxppc-embedded


Hey Alex.

I now have a sandpoint that exhibits problems like you were seeing.  That's good
because now I have some to test with.

In addition, I'm planning on a fairly major overhaul sometime "soon".  The
problem is, I'm neck deep in other work right now.

I'll let you know when I have that work done.  If you fix any problems, please
let the community know.

Thanks,

Mark
--

> Hi, Mark!
>
> On Mon, Sep 11, 2000 at 04:11:53PM -0700, you wrote the following:
>
> > > Today I think I noticed a very interesting consistency that might be
> > > helpful. I haven't had the time to test it completely; I'll do it
> > > tomorrow and post again. The thing is, there's a little green led on
> > > the board saying "backup power" or something like that. If you turn
> > > off the computer and the power supply, and leave it off for half a
> > > minute or so, the led turns off. If you turn the computer on
> > > afterwards and load the kernel, it loads init and you can work (until
> > > it crashes). If you just reset the computer and load the kernel (after
> > > uploading it via dink of course), init won't load.
> > >
> >
> > This almost sounds like a hardware problem.  How old is your
> > processor module?  Remember this is a test platform for MOT SPS
> > where they test out new processors, etc.  They may have given you an
> > early rev board or processor or host bridge or...  If you have an
> > old one, you may want to ask for a newer one.
>
> We've just bought those boards now from Motorola. I checked on their
> site and we have the latest revisions...
>
> As to hardware problems, I took the other box we have here (an
> identical configuration) and tested there, with the same results. :-(
> So if it's a hardware problem, it's in all those boards. I also ran
> the memory test that dink has on all the memory that I can (from 90000
> to the end; before that resides dink itself) and it didn't find any
> errors. (I ran all the six or seven tests, 19 times in a row -- about
> 19-20 hours of testing.) So unless I'm extremely unlucky (and there
> are problems in the low range), the memory isn't a problem either.
>
> I downloaded the compilers from CDK 1.2 and compiled a kernel with
> them. Made no difference.
>
> Here are some more crash dumps, FWIW. Something is definitely fishy in
> regard to memory management.
>
> This one is weird -- I don't have any swap.
>
> > mount-t^H ^H^H^H
>
> sh: mount-: coBad swap file entry 00000085
> kernel BUG at swap_state.c:71!
>
> NIP: C002F0D4 XER: 20000000 LR: C002F0D4 REGS: c0283ce0 TRAP: 0700
> MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c0282000[7] 'sh' Last syscall: 1
> last math c0282000 last altivec 00000000
> GPR00: C002F0D4 C0283D90 C0282000 0000001F 00001032 C0105734 C0140000 00000000
> GPR08: C0140000 C0100000 C0110000 C0283CD0 44262824 100A09F8 00000000 100A6190
> GPR16: 00000000 00000000 00000000 00000000 C01310E0 C0284100 1024D000 0FF2E000
> GPR24: 00000000 00000000 0032E000 00000041 C03CDB4C C0284100 00000085 C01AF6B8
> Call backtrace:
> C002F0D4 C002F1EC C002F328 C00208B8 C002393C C0013F84 C0016FB0
> C00171F4 C0004CC0 0FE79968 1002190C 10020DC8 1001D75C 1004D09C
> 10010B08 1000FBC4 0FE6F75C 00000000
> Kernel panic: Exception in kernel pc c002f0d4 signal 4
>
> backtrace:
> 0xc002f0d4 -- 0xc002f080 + 0x0054   __delete_from_swap_cache
> 0xc002f1ec -- 0xc002f14c + 0x00a0   delete_from_swap_cache_nolock
> 0xc002f328 -- 0xc002f280 + 0x00a8   free_page_and_swap_cache
> 0xc00208b8 -- 0xc0020730 + 0x0188   zap_page_range
> 0xc002393c -- 0xc0023838 + 0x0104   exit_mmap
> 0xc0013f84 -- 0xc0013f4c + 0x0038   mmput
> 0xc0016fb0 -- 0xc0016ed0 + 0x00e0   do_exit
> 0xc00171f4 -- 0xc00171f4 + 0x0000   sys_wait4
> 0xc0004cc0 -- 0xc0004cc0 + 0x0000   ret_from_syscall_1
>
> And this one is crazy -- 14,500,000 worked fine, 15,000,000 gave me
> "Out of memory", and the middle between them gave me this:
>
> bash-2.03# perl -e '$a="A"x14750000'
> kmem_free: Bad obj addr (objp=c0177500, name=size-64)
> kernel BUG at slab.c:1695!
> NIP: C002CDD4 XER: 20000000 LR: C002CDD4 REGS: c0104cd0 TRAP: 0700
> MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c0103000[0] 'swapper' Last syscall: 36
> last math 00000000 last altivec 00000000
> GPR00: C002CDD4 C0104D80 C0103000 0000001B 00001032 C0105734 C0140000 00000000
> GPR08: C0140000 C0100000 C0110000 C0104CC0 24462024 100A09F8 00000000 00000000
> GPR16: 00000000 00000000 00000000 00000000 00000000 00104EB0 C02F7042 C02FFA40
> GPR24: C0177500 00000014 0000001C C01023E0 C017755C C0177FE0 C0177500 C01A0160
> Call backtrace:
> C002CDD4 C009E1E4 C009EA68 C009DA98 C009DF70 C0093E4C C00189EC
> C0004F60 00000000 C0006130 C0006144 C011678C 00003C60
> Kernel panic: Exception in kernel pc c002cdd4 signal 4
> In interrupt handler - not syncing
> Rebooting in 180 seconds..
>
> backtrace:
> 0xc002cdd4 -- 0xc002ca04 + 0x03d0   kfree
> 0xc009e1e4 -- 0xc009e0e4 + 0x0100   ip_free
> 0xc009ea68 -- 0xc009e740 + 0x0328   ip_defrag
> 0xc009da98 -- 0xc009da70 + 0x0028   ip_local_deliver
> 0xc009df70 -- 0xc009dc44 + 0x032c   ip_rcv
> 0xc0093e4c -- 0xc0093c48 + 0x0204   net_rx_action
> 0xc00189ec -- 0xc0018934 + 0x00b8   do_softirq
> 0xc0004f60 -- 0xc0004f60 + 0x0000   do_bottom_half_ret
> 0x00000000 -- unknown address
> 0xc0006130 -- 0xc00060c0 + 0x0070   idled
> 0xc0006144 -- 0xc0006134 + 0x0010   cpu_idle
> 0xc011678c -- 0xc0116644 + 0x0148   start_kernel
> 0x00003c60 -- unknown address
>
> --
> Alex Shnitman                            | http://www.debian.org
> alexsh@hectic.net, alexsh@linux.org.il   +-----------------------
> http://alexsh.hectic.net    UIN 188956    PGP key on web page
>        E1 F2 7B 6C A0 31 80 28  63 B8 02 BA 65 C7 8B BA
>
> /real/ kernel hackers
>     dd if=/dev/urandom of=/vmlinuz
> and influence the Universal Randomosity Field.
>         -- Gaal Yahas

--
Mark A. Greer (mgreer@mvista.com; 480-517-0287)
MontaVista Software, Inc.
2141 E. Broadway Road, Suite 108
Tempe, AZ  85282


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2000-09-18 17:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-09-12  1:25 Sandpoint & random crashes? ZHANG,HAI-TAO (Non-A-China,ex1)
2000-09-12 17:44 ` Mark A. Greer
  -- strict thread matches above, loose matches on Subject: below --
2000-09-11  1:32 ZHANG,HAI-TAO (Non-A-China,ex1)
2000-09-11 20:35 ` Mark A. Greer
2000-09-11 21:12   ` Alex Shnitman
2000-09-11 23:11     ` Mark A. Greer
2000-09-11 23:53       ` Ron Bianco
2000-09-12  6:57         ` Alex Shnitman
2000-09-18 14:13       ` Alex Shnitman
2000-09-18 17:42         ` Mark A. Greer
2000-09-10 19:49 Alex Shnitman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).