From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 18 Sep 2000 17:13:29 +0300 From: Alex Shnitman To: "Mark A. Greer" Cc: linuxppc-embedded@lists.linuxppc.org Subject: Re: Sandpoint & random crashes? Message-ID: <20000918171328.A4328@hectic.net> References: <2F67A63DFFB1D31185D90090278CBB2D014ECF08@apmail6.chn.agilent.com> <39BD4215.6FD9B56F@mvista.com> <20000912001214.B17705@hectic.net> <39BD66B9.9A1A362F@mvista.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <39BD66B9.9A1A362F@mvista.com>; from mgreer@mvista.com on Mon, Sep 11, 2000 at 04:11:53PM -0700 Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: Hi, Mark! On Mon, Sep 11, 2000 at 04:11:53PM -0700, you wrote the following: > > Today I think I noticed a very interesting consistency that might be > > helpful. I haven't had the time to test it completely; I'll do it > > tomorrow and post again. The thing is, there's a little green led on > > the board saying "backup power" or something like that. If you turn > > off the computer and the power supply, and leave it off for half a > > minute or so, the led turns off. If you turn the computer on > > afterwards and load the kernel, it loads init and you can work (until > > it crashes). If you just reset the computer and load the kernel (after > > uploading it via dink of course), init won't load. > > > > This almost sounds like a hardware problem. How old is your > processor module? Remember this is a test platform for MOT SPS > where they test out new processors, etc. They may have given you an > early rev board or processor or host bridge or... If you have an > old one, you may want to ask for a newer one. We've just bought those boards now from Motorola. I checked on their site and we have the latest revisions... As to hardware problems, I took the other box we have here (an identical configuration) and tested there, with the same results. :-( So if it's a hardware problem, it's in all those boards. I also ran the memory test that dink has on all the memory that I can (from 90000 to the end; before that resides dink itself) and it didn't find any errors. (I ran all the six or seven tests, 19 times in a row -- about 19-20 hours of testing.) So unless I'm extremely unlucky (and there are problems in the low range), the memory isn't a problem either. I downloaded the compilers from CDK 1.2 and compiled a kernel with them. Made no difference. Here are some more crash dumps, FWIW. Something is definitely fishy in regard to memory management. This one is weird -- I don't have any swap. > mount-t^H ^H^H^H sh: mount-: coBad swap file entry 00000085 kernel BUG at swap_state.c:71! NIP: C002F0D4 XER: 20000000 LR: C002F0D4 REGS: c0283ce0 TRAP: 0700 MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK = c0282000[7] 'sh' Last syscall: 1 last math c0282000 last altivec 00000000 GPR00: C002F0D4 C0283D90 C0282000 0000001F 00001032 C0105734 C0140000 00000000 GPR08: C0140000 C0100000 C0110000 C0283CD0 44262824 100A09F8 00000000 100A6190 GPR16: 00000000 00000000 00000000 00000000 C01310E0 C0284100 1024D000 0FF2E000 GPR24: 00000000 00000000 0032E000 00000041 C03CDB4C C0284100 00000085 C01AF6B8 Call backtrace: C002F0D4 C002F1EC C002F328 C00208B8 C002393C C0013F84 C0016FB0 C00171F4 C0004CC0 0FE79968 1002190C 10020DC8 1001D75C 1004D09C 10010B08 1000FBC4 0FE6F75C 00000000 Kernel panic: Exception in kernel pc c002f0d4 signal 4 backtrace: 0xc002f0d4 -- 0xc002f080 + 0x0054 __delete_from_swap_cache 0xc002f1ec -- 0xc002f14c + 0x00a0 delete_from_swap_cache_nolock 0xc002f328 -- 0xc002f280 + 0x00a8 free_page_and_swap_cache 0xc00208b8 -- 0xc0020730 + 0x0188 zap_page_range 0xc002393c -- 0xc0023838 + 0x0104 exit_mmap 0xc0013f84 -- 0xc0013f4c + 0x0038 mmput 0xc0016fb0 -- 0xc0016ed0 + 0x00e0 do_exit 0xc00171f4 -- 0xc00171f4 + 0x0000 sys_wait4 0xc0004cc0 -- 0xc0004cc0 + 0x0000 ret_from_syscall_1 And this one is crazy -- 14,500,000 worked fine, 15,000,000 gave me "Out of memory", and the middle between them gave me this: bash-2.03# perl -e '$a="A"x14750000' kmem_free: Bad obj addr (objp=c0177500, name=size-64) kernel BUG at slab.c:1695! NIP: C002CDD4 XER: 20000000 LR: C002CDD4 REGS: c0104cd0 TRAP: 0700 MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK = c0103000[0] 'swapper' Last syscall: 36 last math 00000000 last altivec 00000000 GPR00: C002CDD4 C0104D80 C0103000 0000001B 00001032 C0105734 C0140000 00000000 GPR08: C0140000 C0100000 C0110000 C0104CC0 24462024 100A09F8 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00104EB0 C02F7042 C02FFA40 GPR24: C0177500 00000014 0000001C C01023E0 C017755C C0177FE0 C0177500 C01A0160 Call backtrace: C002CDD4 C009E1E4 C009EA68 C009DA98 C009DF70 C0093E4C C00189EC C0004F60 00000000 C0006130 C0006144 C011678C 00003C60 Kernel panic: Exception in kernel pc c002cdd4 signal 4 In interrupt handler - not syncing Rebooting in 180 seconds.. backtrace: 0xc002cdd4 -- 0xc002ca04 + 0x03d0 kfree 0xc009e1e4 -- 0xc009e0e4 + 0x0100 ip_free 0xc009ea68 -- 0xc009e740 + 0x0328 ip_defrag 0xc009da98 -- 0xc009da70 + 0x0028 ip_local_deliver 0xc009df70 -- 0xc009dc44 + 0x032c ip_rcv 0xc0093e4c -- 0xc0093c48 + 0x0204 net_rx_action 0xc00189ec -- 0xc0018934 + 0x00b8 do_softirq 0xc0004f60 -- 0xc0004f60 + 0x0000 do_bottom_half_ret 0x00000000 -- unknown address 0xc0006130 -- 0xc00060c0 + 0x0070 idled 0xc0006144 -- 0xc0006134 + 0x0010 cpu_idle 0xc011678c -- 0xc0116644 + 0x0148 start_kernel 0x00003c60 -- unknown address -- Alex Shnitman | http://www.debian.org alexsh@hectic.net, alexsh@linux.org.il +----------------------- http://alexsh.hectic.net UIN 188956 PGP key on web page E1 F2 7B 6C A0 31 80 28 63 B8 02 BA 65 C7 8B BA /real/ kernel hackers dd if=/dev/urandom of=/vmlinuz and influence the Universal Randomosity Field. -- Gaal Yahas ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/