From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <39C653F2.FC013380@mvista.com> Date: Mon, 18 Sep 2000 10:42:10 -0700 From: "Mark A. Greer" Reply-To: mgreer@mvista.com MIME-Version: 1.0 To: Alex Shnitman CC: linuxppc-embedded@lists.linuxppc.org Subject: Re: Sandpoint & random crashes? References: <2F67A63DFFB1D31185D90090278CBB2D014ECF08@apmail6.chn.agilent.com> <39BD4215.6FD9B56F@mvista.com> <20000912001214.B17705@hectic.net> <39BD66B9.9A1A362F@mvista.com> <20000918171328.A4328@hectic.net> Content-Type: text/plain; charset=us-ascii Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: Hey Alex. I now have a sandpoint that exhibits problems like you were seeing. That's good because now I have some to test with. In addition, I'm planning on a fairly major overhaul sometime "soon". The problem is, I'm neck deep in other work right now. I'll let you know when I have that work done. If you fix any problems, please let the community know. Thanks, Mark -- > Hi, Mark! > > On Mon, Sep 11, 2000 at 04:11:53PM -0700, you wrote the following: > > > > Today I think I noticed a very interesting consistency that might be > > > helpful. I haven't had the time to test it completely; I'll do it > > > tomorrow and post again. The thing is, there's a little green led on > > > the board saying "backup power" or something like that. If you turn > > > off the computer and the power supply, and leave it off for half a > > > minute or so, the led turns off. If you turn the computer on > > > afterwards and load the kernel, it loads init and you can work (until > > > it crashes). If you just reset the computer and load the kernel (after > > > uploading it via dink of course), init won't load. > > > > > > > This almost sounds like a hardware problem. How old is your > > processor module? Remember this is a test platform for MOT SPS > > where they test out new processors, etc. They may have given you an > > early rev board or processor or host bridge or... If you have an > > old one, you may want to ask for a newer one. > > We've just bought those boards now from Motorola. I checked on their > site and we have the latest revisions... > > As to hardware problems, I took the other box we have here (an > identical configuration) and tested there, with the same results. :-( > So if it's a hardware problem, it's in all those boards. I also ran > the memory test that dink has on all the memory that I can (from 90000 > to the end; before that resides dink itself) and it didn't find any > errors. (I ran all the six or seven tests, 19 times in a row -- about > 19-20 hours of testing.) So unless I'm extremely unlucky (and there > are problems in the low range), the memory isn't a problem either. > > I downloaded the compilers from CDK 1.2 and compiled a kernel with > them. Made no difference. > > Here are some more crash dumps, FWIW. Something is definitely fishy in > regard to memory management. > > This one is weird -- I don't have any swap. > > > mount-t^H ^H^H^H > > sh: mount-: coBad swap file entry 00000085 > kernel BUG at swap_state.c:71! > > NIP: C002F0D4 XER: 20000000 LR: C002F0D4 REGS: c0283ce0 TRAP: 0700 > MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 > TASK = c0282000[7] 'sh' Last syscall: 1 > last math c0282000 last altivec 00000000 > GPR00: C002F0D4 C0283D90 C0282000 0000001F 00001032 C0105734 C0140000 00000000 > GPR08: C0140000 C0100000 C0110000 C0283CD0 44262824 100A09F8 00000000 100A6190 > GPR16: 00000000 00000000 00000000 00000000 C01310E0 C0284100 1024D000 0FF2E000 > GPR24: 00000000 00000000 0032E000 00000041 C03CDB4C C0284100 00000085 C01AF6B8 > Call backtrace: > C002F0D4 C002F1EC C002F328 C00208B8 C002393C C0013F84 C0016FB0 > C00171F4 C0004CC0 0FE79968 1002190C 10020DC8 1001D75C 1004D09C > 10010B08 1000FBC4 0FE6F75C 00000000 > Kernel panic: Exception in kernel pc c002f0d4 signal 4 > > backtrace: > 0xc002f0d4 -- 0xc002f080 + 0x0054 __delete_from_swap_cache > 0xc002f1ec -- 0xc002f14c + 0x00a0 delete_from_swap_cache_nolock > 0xc002f328 -- 0xc002f280 + 0x00a8 free_page_and_swap_cache > 0xc00208b8 -- 0xc0020730 + 0x0188 zap_page_range > 0xc002393c -- 0xc0023838 + 0x0104 exit_mmap > 0xc0013f84 -- 0xc0013f4c + 0x0038 mmput > 0xc0016fb0 -- 0xc0016ed0 + 0x00e0 do_exit > 0xc00171f4 -- 0xc00171f4 + 0x0000 sys_wait4 > 0xc0004cc0 -- 0xc0004cc0 + 0x0000 ret_from_syscall_1 > > And this one is crazy -- 14,500,000 worked fine, 15,000,000 gave me > "Out of memory", and the middle between them gave me this: > > bash-2.03# perl -e '$a="A"x14750000' > kmem_free: Bad obj addr (objp=c0177500, name=size-64) > kernel BUG at slab.c:1695! > NIP: C002CDD4 XER: 20000000 LR: C002CDD4 REGS: c0104cd0 TRAP: 0700 > MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 > TASK = c0103000[0] 'swapper' Last syscall: 36 > last math 00000000 last altivec 00000000 > GPR00: C002CDD4 C0104D80 C0103000 0000001B 00001032 C0105734 C0140000 00000000 > GPR08: C0140000 C0100000 C0110000 C0104CC0 24462024 100A09F8 00000000 00000000 > GPR16: 00000000 00000000 00000000 00000000 00000000 00104EB0 C02F7042 C02FFA40 > GPR24: C0177500 00000014 0000001C C01023E0 C017755C C0177FE0 C0177500 C01A0160 > Call backtrace: > C002CDD4 C009E1E4 C009EA68 C009DA98 C009DF70 C0093E4C C00189EC > C0004F60 00000000 C0006130 C0006144 C011678C 00003C60 > Kernel panic: Exception in kernel pc c002cdd4 signal 4 > In interrupt handler - not syncing > Rebooting in 180 seconds.. > > backtrace: > 0xc002cdd4 -- 0xc002ca04 + 0x03d0 kfree > 0xc009e1e4 -- 0xc009e0e4 + 0x0100 ip_free > 0xc009ea68 -- 0xc009e740 + 0x0328 ip_defrag > 0xc009da98 -- 0xc009da70 + 0x0028 ip_local_deliver > 0xc009df70 -- 0xc009dc44 + 0x032c ip_rcv > 0xc0093e4c -- 0xc0093c48 + 0x0204 net_rx_action > 0xc00189ec -- 0xc0018934 + 0x00b8 do_softirq > 0xc0004f60 -- 0xc0004f60 + 0x0000 do_bottom_half_ret > 0x00000000 -- unknown address > 0xc0006130 -- 0xc00060c0 + 0x0070 idled > 0xc0006144 -- 0xc0006134 + 0x0010 cpu_idle > 0xc011678c -- 0xc0116644 + 0x0148 start_kernel > 0x00003c60 -- unknown address > > -- > Alex Shnitman | http://www.debian.org > alexsh@hectic.net, alexsh@linux.org.il +----------------------- > http://alexsh.hectic.net UIN 188956 PGP key on web page > E1 F2 7B 6C A0 31 80 28 63 B8 02 BA 65 C7 8B BA > > /real/ kernel hackers > dd if=/dev/urandom of=/vmlinuz > and influence the Universal Randomosity Field. > -- Gaal Yahas -- Mark A. Greer (mgreer@mvista.com; 480-517-0287) MontaVista Software, Inc. 2141 E. Broadway Road, Suite 108 Tempe, AZ 85282 ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/