* Help with OOPSes, anyone?
@ 2002-01-27 8:22 Matthew Dharm
2002-01-27 10:53 ` Jason Gunthorpe
2002-01-27 17:33 ` Pete Popov
0 siblings, 2 replies; 18+ messages in thread
From: Matthew Dharm @ 2002-01-27 8:22 UTC (permalink / raw)
To: linux-mips
[-- Attachment #1: Type: text/plain, Size: 3410 bytes --]
So, I'm back to trying to get Linux running on our boards, and I'm having a
problem I'd like some help with...
(See http://www.momenco.com/products/ocelot if you're interested.)
At one point, we had 2.4.5 working on this board. That was the result of
joint work between myself, RedHat, and MontaVista. Seemed to work pretty
well, given that we had basically no userspace to work with. So, while it
seemed functional, it's not much of a datapoint.
So, I did a cvs update, and tried to build 2.4.17. Lo and behold, it still
compiles. And it even runs. And the RedHat 7.1 userspace from oss.sgi.com
even seems to work, mostly.
But, under certain conditions, the kernel OOPSes. Attached to this message
are a few of those OOPSes (serial console is wonderful!) along with the
ksymoops output. I think the read_lsmod() warning is bogus, because there
are, actually, no modules loaded.
My instincts are telling me that these are all being caused by the same
problem, but I'll be damned if I can figure out what that is. Caching is a
good suspect, but that's just because it's always a good suspect.
What does work is a ping -f to the board... so I think the interrupt code
is rock-solid. It's pretty simple anyway, so I'm not suprised that's a
problem.
While we did have some problem supporting our boards that have 512MB on
them, the board I'm testing with only has 128MB. Dealing with the 512MB
problems (which, I understand, are caused by the cache-flush routines
trying to flush too much), is on the TODO list, but I don't think those
problems are affecting me right now.
Load does seem to be a factor. Big compiles, lots of NFS traffic, that
sort of thing seems to be the triggering factor. It's easy to duplicate,
given a few minutes. Oddly enough, lite loads or idle doesn't trigger this
-- I FTPed the SRPM for wget and built it without any problems. Heck, it
even works! But when I try to build something bigger (say, ncftp or
glibc), it dies an ugly death. Heck, I could FTP, build, and use ksymoops
natively on the board without any problems. Lots of things work fine,
but sometimes....
I know Ralf has one of our boards, but I don't know if he's in the same
country as that board... Ralf?
In these OOPSes, one is caused by some code in unaligned.c -- I've seen
several (many) like this, tho I only captured and decoded one. The code in
question seems to be one of those "you can never get here" situations,
which makes me really worry. The other two look like some form of
NULL-pointer dereference.
I've tried several different configurations, stripping down drivers as I
go. I'm going to continue that tomorrow/monday, but I don't have high
hopes that will fix the problem. I'm thinking about trying
CONFIG_MIPS_UNCACHED, but I don't know if that works on an RM7000 processor
-- the L1 and L2 are built-in to the processor, and I don't think the L1
can be deactivated. Then again, I don't know how CONFIG_MIPS_UNCACHED
works, so if someone would like to clue me in on the subject, I'd really
appreciate it.
Another thing I'm going to try is an ELF image from that Redhat/Montavista
work, to see if it shows the same problems. That datapoint will be useful
also.
If anyone has any clues as to what's going on, I'd really appreciate it.
Anyone?
Matt
--
Matthew Dharm Work: mdharm@momenco.com
Senior Software Designer, Momentum Computer
[-- Attachment #2: oops.in --]
[-- Type: text/plain, Size: 1333 bytes --]
Unhandled kernel unaligned access in unaligned.c:emulate_load_store_insn, line 345:
$0 : 00000000 90045400 813fb004 021dc71d 813fb000 00000003 00000001 85a19020
$8 : 00000028 80242788 00000000 0065002c 3c520292 00000000 3c520292 00000000
$16: 021dc71d 813fa3fc 00000001 90045401 813fb004 802c3aa8 00000003 00008001
$24: 8129ea64 85750cc0 813f6000 813f7e90 813f7e90 802396d8
Hi : 00000000
Lo : 0000000e
epc : 8010a3fc Not tainted
Status: 90045402
Cause : 00008010
Process rpciod (pid: 8, stackpage=813f6000)
Stack: 00000002 8023a360 85747274 90045401 813fa000 813fa3fc 85750bc0 802c64c8
802c64c8 8029541c 813ff410 00000000 802396d8 802396bc 85750bc0 813fe8e0
85e2c140 85a19620 813fa000 8023904c 813fa314 85747700 85750bc0 813fe8e0
85750bc0 85750c14 00000005 8023b1a4 85750c14 85750bc0 00000005 802c64c8
00000000 85750bc0 8023a92c 00008400 acd8a3f8 85461000 3c520291 00000000
00000000 ...
Call Trace: [<8023a360>] [<802396d8>] [<802396bc>] [<8023904c>] [<8023b1a4>] [<8023a92c>]
[<8023ab9c>] [<8023aa54>] [<8023b7b8>] [<8023b620>] [<8023b620>] [<80100da0>]
[<8023bb8c>] [<8023c1a0>] [<80100d90>]
Code: 00c09021 3c15802c 26b53aa8 <8e05fffc> 8ca20000 00561024 1040002f 00001821 40116000
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
[-- Attachment #3: oops.out --]
[-- Type: text/plain, Size: 3392 bytes --]
ksymoops 2.4.0 on mips 2.4.17. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.17/ (default)
-m /boot/System.map-2.4.17 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
$0 : 00000000 90045400 813fb004 021dc71d 813fb000 00000003 00000001 85a19020
$8 : 00000028 80242788 00000000 0065002c 3c520292 00000000 3c520292 00000000
$16: 021dc71d 813fa3fc 00000001 90045401 813fb004 802c3aa8 00000003 00008001
$24: 8129ea64 85750cc0 813f6000 813f7e90 813f7e90 802396d8
epc : 8010a3fc Not tainted
Using defaults from ksymoops -t elf32-tradbigmips -a mips:3000
Status: 90045402
Cause : 00008010
Process rpciod (pid: 8, stackpage=813f6000)
Stack: 00000002 8023a360 85747274 90045401 813fa000 813fa3fc 85750bc0 802c64c8
802c64c8 8029541c 813ff410 00000000 802396d8 802396bc 85750bc0 813fe8e0
85e2c140 85a19620 813fa000 8023904c 813fa314 85747700 85750bc0 813fe8e0
85750bc0 85750c14 00000005 8023b1a4 85750c14 85750bc0 00000005 802c64c8
00000000 85750bc0 8023a92c 00008400 acd8a3f8 85461000 3c520291 00000000
00000000 ...
Call Trace: [<8023a360>] [<802396d8>] [<802396bc>] [<8023904c>] [<8023b1a4>] [<8023a92c>]
[<8023ab9c>] [<8023aa54>] [<8023b7b8>] [<8023b620>] [<8023b620>] [<80100da0>]
[<8023bb8c>] [<8023c1a0>] [<80100d90>]
Code: 00c09021 3c15802c 26b53aa8 <8e05fffc> 8ca20000 00561024 1040002f 00001821 40116000
>>PC; 8010a3fc <__wake_up+6c/198> <=====
Trace; 8023a360 <rpc_wake_up_next+6c/ac>
Trace; 802396d8 <xprt_clear_backlog+48/5c>
Trace; 802396bc <xprt_clear_backlog+2c/5c>
Trace; 8023904c <xprt_release+cc/110>
Trace; 8023b1a4 <rpc_release_task+1d4/240>
Trace; 8023a92c <__rpc_execute+378/3a0>
Trace; 8023ab9c <__rpc_schedule+1a0/20c>
Trace; 8023aa54 <__rpc_schedule+58/20c>
Trace; 8023b7b8 <rpciod+198/370>
Trace; 8023b620 <rpciod+0/370>
Trace; 8023b620 <rpciod+0/370>
Trace; 80100da0 <kernel_thread+40/70>
Trace; 8023bb8c <rpciod_up+d4/180>
Trace; 8023c1a0 <rpcauth_create+40/58>
Trace; 80100d90 <kernel_thread+30/70>
Code; 8010a3f0 <__wake_up+60/198>
00000000 <_PC>:
Code; 8010a3f0 <__wake_up+60/198>
0: 00c09021 move s2,a2
Code; 8010a3f4 <__wake_up+64/198>
4: 3c15802c lui s5,0x802c
Code; 8010a3f8 <__wake_up+68/198>
8: 26b53aa8 addiu s5,s5,15016
Code; 8010a3fc <__wake_up+6c/198> <=====
c: 8e05fffc lw a1,-4(s0) <=====
Code; 8010a400 <__wake_up+70/198>
10: 8ca20000 lw v0,0(a1)
Code; 8010a404 <__wake_up+74/198>
14: 00561024 and v0,v0,s6
Code; 8010a408 <__wake_up+78/198>
18: 1040002f beqz v0,d8 <_PC+0xd8> 8010a4c8 <__wake_up+138/198>
Code; 8010a40c <__wake_up+7c/198>
1c: 00001821 move v1,zero
Code; 8010a410 <__wake_up+80/198>
20: 40116000 mfc0 s1,$12
Kernel panic: Aiee, killing interrupt handler!
2 warnings issued. Results may not be reliable.
[-- Attachment #4: oops2.in --]
[-- Type: text/plain, Size: 1827 bytes --]
Unable to handle kernel paging request at virtual address 00000020, epc == 8010a19c, ra <1>Unable to handle kernel paging request at virtual address 00000062, epc == 80206ce4, ra == 80217c2c
Oops in fault.c:do_page_fault, line 204:
$0 : 00000000 00000001 812efecc 812efe60 00000000 802e0020 0000ab22 812efecc
$8 : c0a80112 00000000 00000000 00000002 00000000 00000000 802e3626 00000000
$16: 812efecc 00001d3c 812ee8a0 813fc620 87156b20 812ee8a0 00000000 812ee8cc
$24: 00000000 85919c47 85918000 85919a50 87156bf8 80217c2c
Hi : 00000000
Lo : 00000e00
epc : 80206ce4 Not tainted
Status: b0045403
Cause : 00008008
Process cc1 (pid: 3366, stackpage=85918000)
Stack: 87156bf8 8021636c c0a80101 00000000 00000000 801f3ee0 812efc80 81296140
81296140 87156b20 87156bf8 81296140 813fc620 00000000 87156bf8 00001d3c
812efee0 00000000 87156b20 00000020 812ee8a0 812ee8cc 80217c2c 80217b38
812ee8a0 802f6c94 801ef2b0 812fb590 87156bf8 87156b20 87156bf8 802c64c0
802b68a0 802b68c0 00800000 00000060 00000010 8021aa50 00000056 812ea140
00000001 ...
Call Trace: [<8021636c>] [<801f3ee0>] [<80217c2c>] [<80217b38>] [<801ef2b0>] [<8021aa50>]
[<8021b62c>] [<80203978>] [<80236c68>] [<8011773c>] [<8011354c>] [<801f421c>]
[<801133dc>] [<8011782c>] [<80112e8c>] [<8010631c>] [<80106630>] [<801065ec>]
[<8019609c>] [<8012b808>] [<8023b1e4>] [<80117448>] [<801a7aec>] [<801a7aec>]
[<8010dd00>] [<8010dd8c>] [<8010debc>] [<8010e2b8>] [<8010e184>] [<80247686>]
[<80108c3c>] [<8012b808>] [<80247628>] [<8010a19c>] [<801033f0>] [<8012b460>]
[<8011ed48>] [<8011ede8>] [<8011ee00>] [<80109938>] [<8011efd0>] ...
Code: 8ea20080 8ea3007c 8e64000c <94850062> 00431023 0045102a 1040002c 8eb1000c 8c8200fc
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
[-- Attachment #5: oops2.out --]
[-- Type: text/plain, Size: 5169 bytes --]
ksymoops 2.4.0 on mips 2.4.17. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.17/ (default)
-m /boot/System.map-2.4.17 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Unable to handle kernel paging request at virtual address 00000020, epc == 8010a19c, ra <1>Unable to handle kernel paging request at virtual address 00000062, epc == 80206ce4, ra == 80217c2c
$0 : 00000000 00000001 812efecc 812efe60 00000000 802e0020 0000ab22 812efecc
$8 : c0a80112 00000000 00000000 00000002 00000000 00000000 802e3626 00000000
$16: 812efecc 00001d3c 812ee8a0 813fc620 87156b20 812ee8a0 00000000 812ee8cc
$24: 00000000 85919c47 85918000 85919a50 87156bf8 80217c2c
epc : 80206ce4 Not tainted
Using defaults from ksymoops -t elf32-tradbigmips -a mips:3000
Status: b0045403
Cause : 00008008
Process cc1 (pid: 3366, stackpage=85918000)
Stack: 87156bf8 8021636c c0a80101 00000000 00000000 801f3ee0 812efc80 81296140
81296140 87156b20 87156bf8 81296140 813fc620 00000000 87156bf8 00001d3c
812efee0 00000000 87156b20 00000020 812ee8a0 812ee8cc 80217c2c 80217b38
812ee8a0 802f6c94 801ef2b0 812fb590 87156bf8 87156b20 87156bf8 802c64c0
802b68a0 802b68c0 00800000 00000060 00000010 8021aa50 00000056 812ea140
00000001 ...
Call Trace: [<8021636c>] [<801f3ee0>] [<80217c2c>] [<80217b38>] [<801ef2b0>] [<8021aa50>]
[<8021b62c>] [<80203978>] [<80236c68>] [<8011773c>] [<8011354c>] [<801f421c>]
[<801133dc>] [<8011782c>] [<80112e8c>] [<8010631c>] [<80106630>] [<801065ec>]
[<8019609c>] [<8012b808>] [<8023b1e4>] [<80117448>] [<801a7aec>] [<801a7aec>]
[<8010dd00>] [<8010dd8c>] [<8010debc>] [<8010e2b8>] [<8010e184>] [<80247686>]
[<80108c3c>] [<8012b808>] [<80247628>] [<8010a19c>] [<801033f0>] [<8012b460>]
[<8011ed48>] [<8011ede8>] [<8011ee00>] [<80109938>] [<8011efd0>] ...
Warning (Oops_trace_line): garbage '...' at end of trace line ignored
Code: 8ea20080 8ea3007c 8e64000c <94850062> 00431023 0045102a 1040002c 8eb1000c 8c8200fc
>>RA; 80217c2c <tcp_transmit_skb+534/610>
>>PC; 80206ce4 <ip_queue_xmit+26c/628> <=====
Trace; 8021636c <tcp_rcv_established+874/8c4>
Trace; 801f3ee0 <net_tx_action+94/160>
Trace; 80217c2c <tcp_transmit_skb+534/610>
Trace; 80217b38 <tcp_transmit_skb+440/610>
Trace; 801ef2b0 <alloc_skb+168/278>
Trace; 8021aa50 <tcp_send_ack+100/114>
Trace; 8021b62c <tcp_delack_timer+1b8/234>
Trace; 80203978 <ip_rcv+27c/524>
Trace; 80236c68 <xprt_release_write+34/6c>
Trace; 8011773c <timer_bh+350/3d4>
Trace; 8011354c <bh_action+34/88>
Trace; 801f421c <net_rx_action+214/3d8>
Trace; 801133dc <tasklet_hi_action+cc/148>
Trace; 8011782c <do_timer+6c/c4>
Trace; 80112e8c <do_softirq+bc/188>
Trace; 8010631c <handle_IRQ_event+80/f4>
Trace; 80106630 <do_IRQ+f0/114>
Trace; 801065ec <do_IRQ+ac/114>
Trace; 8019609c <ll_galileo_irq+c/14>
Trace; 8012b808 <__alloc_pages+70/21c>
Trace; 8023b1e4 <rpc_release_task+214/240>
Trace; 80117448 <timer_bh+5c/3d4>
Trace; 801a7aec <serial_console_write+9c/288>
Trace; 801a7aec <serial_console_write+9c/288>
Trace; 8010dd00 <__call_console_drivers+68/94>
Trace; 8010dd8c <_call_console_drivers+60/70>
Trace; 8010debc <call_console_drivers+120/168>
Trace; 8010e2b8 <release_console_sem+4c/12c>
Trace; 8010e184 <printk+1f8/254>
Trace; 80247686 <mips_io_port_base+12da/2da4>
Trace; 80108c3c <do_page_fault+254/398>
Trace; 8012b808 <__alloc_pages+70/21c>
Trace; 80247628 <mips_io_port_base+127c/2da4>
Trace; 8010a19c <schedule+274/468>
Trace; 801033f0 <ret_from_sys_call+0/34>
Trace; 8012b460 <_alloc_pages+20/2c>
Trace; 8011ed48 <do_anonymous_page+118/154>
Trace; 8011ede8 <do_no_page+64/1c8>
Trace; 8011ee00 <do_no_page+7c/1c8>
Trace; 80109938 <nopage_tlbl+f4/fc>
Trace; 8011efd0 <handle_mm_fault+84/144>
Code; 80206cd8 <ip_queue_xmit+260/628>
00000000 <_PC>:
Code; 80206cd8 <ip_queue_xmit+260/628>
0: 8ea20080 lw v0,128(s5)
Code; 80206cdc <ip_queue_xmit+264/628>
4: 8ea3007c lw v1,124(s5)
Code; 80206ce0 <ip_queue_xmit+268/628>
8: 8e64000c lw a0,12(s3)
Code; 80206ce4 <ip_queue_xmit+26c/628> <=====
c: 94850062 lhu a1,98(a0) <=====
Code; 80206ce8 <ip_queue_xmit+270/628>
10: 00431023 subu v0,v0,v1
Code; 80206cec <ip_queue_xmit+274/628>
14: 0045102a slt v0,v0,a1
Code; 80206cf0 <ip_queue_xmit+278/628>
18: 1040002c beqz v0,cc <_PC+0xcc> 80206da4 <ip_queue_xmit+32c/628>
Code; 80206cf4 <ip_queue_xmit+27c/628>
1c: 8eb1000c lw s1,12(s5)
Code; 80206cf8 <ip_queue_xmit+280/628>
20: 8c8200fc lw v0,252(a0)
Kernel panic: Aiee, killing interrupt handler!
3 warnings issued. Results may not be reliable.
[-- Attachment #6: oops3.in --]
[-- Type: text/plain, Size: 1453 bytes --]
Unable to handle kernel paging request at virtual address 00000004, epc == 801289b8, ra == 8016b820
Oops in fault.c:do_page_fault, line 204:
$0 : 00000000 b0045400 00000000 00000000 81207600 81207608 000004c2 00000000
$8 : 813ff000 b0045401 00000000 00000000 00000000 00000000 00000065 87dcfd12
$16: 81207600 000000f0 00000001 812914e8 871ab960 8114f580 00000010 879278e0
$24: 00000000 2af984e0 8719a000 8719bd38 00000000 8016b820
Hi : 00000000
Lo : 00000780
epc : 801289b8 Not tainted
Status: b0045402
Cause : 0000800c
Process rpmq (pid: 666, stackpage=8719a000)
Stack: 8719bd60 00000000 00000000 8017065c 00000000 81205b20 8016b820 8016df98
00000001 81205be0 8719bd60 8719bd60 00000000 879278e0 879278e0 8114f580
871ab960 0000001f 00001000 871ab960 879278e0 8016d870 000001d2 871ab960
00000004 8012b808 00001000 879278e0 00000000 879278e0 8114f580 871ab960
8016e338 879279a0 8114f580 00000000 80121cc8 0000001f 00000006 8012b460
8114f580 ...
Call Trace: [<8017065c>] [<8016b820>] [<8016df98>] [<8016d870>] [<8012b808>] [<8016e338>]
[<80121cc8>] [<8012b460>] [<80121dcc>] [<80121dd8>] [<801226cc>] [<80122a0c>]
[<801fb26c>] [<801fb088>] [<801136e4>] [<80123060>] [<80122f58>] [<8016716c>]
[<80112e8c>] [<8010631c>] [<80131bec>] [<801319e8>] [<80106630>] [<801065ec>]
[<801057e8>] [<8019604c>]
Code: 00000000 8d020004 8d030000 <ac620004> ac430000 8e040008 ac880004 ad040000 ad050004
[-- Attachment #7: oops3.out --]
[-- Type: text/plain, Size: 4107 bytes --]
ksymoops 2.4.0 on mips 2.4.17. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.17/ (default)
-m /boot/System.map-2.4.17 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Unable to handle kernel paging request at virtual address 00000004, epc == 801289b8, ra == 8016b820
$0 : 00000000 b0045400 00000000 00000000 81207600 81207608 000004c2 00000000
$8 : 813ff000 b0045401 00000000 00000000 00000000 00000000 00000065 87dcfd12
$16: 81207600 000000f0 00000001 812914e8 871ab960 8114f580 00000010 879278e0
$24: 00000000 2af984e0 8719a000 8719bd38 00000000 8016b820
epc : 801289b8 Not tainted
Using defaults from ksymoops -t elf32-tradbigmips -a mips:3000
Status: b0045402
Cause : 0000800c
Process rpmq (pid: 666, stackpage=8719a000)
Stack: 8719bd60 00000000 00000000 8017065c 00000000 81205b20 8016b820 8016df98
00000001 81205be0 8719bd60 8719bd60 00000000 879278e0 879278e0 8114f580
871ab960 0000001f 00001000 871ab960 879278e0 8016d870 000001d2 871ab960
00000004 8012b808 00001000 879278e0 00000000 879278e0 8114f580 871ab960
8016e338 879279a0 8114f580 00000000 80121cc8 0000001f 00000006 8012b460
8114f580 ...
Call Trace: [<8017065c>] [<8016b820>] [<8016df98>] [<8016d870>] [<8012b808>] [<8016e338>]
[<80121cc8>] [<8012b460>] [<80121dcc>] [<80121dd8>] [<801226cc>] [<80122a0c>]
[<801fb26c>] [<801fb088>] [<801136e4>] [<80123060>] [<80122f58>] [<8016716c>]
[<80112e8c>] [<8010631c>] [<80131bec>] [<801319e8>] [<80106630>] [<801065ec>]
[<801057e8>] [<8019604c>]
Code: 00000000 8d020004 8d030000 <ac620004> ac430000 8e040008 ac880004 ad040000 ad050004
>>RA; 8016b820 <nfs_create_request+d0/1e0>
>>PC; 801289b8 <kmem_cache_alloc+b8/1ac> <=====
Trace; 8017065c <nfs_flush_file+58/a0>
Trace; 8016b820 <nfs_create_request+d0/1e0>
Trace; 8016df98 <nfs_pagein_inode+50/98>
Trace; 8016d870 <nfs_readpage_async+30/fc>
Trace; 8012b808 <__alloc_pages+70/21c>
Trace; 8016e338 <nfs_readpage+10c/154>
Trace; 80121cc8 <add_to_page_cache_unique+b0/c8>
Trace; 8012b460 <_alloc_pages+20/2c>
Trace; 80121dcc <page_cache_read+ec/11c>
Trace; 80121dd8 <page_cache_read+f8/11c>
Trace; 801226cc <generic_file_readahead+174/1ec>
Trace; 80122a0c <do_generic_file_read+24c/51c>
Trace; 801fb26c <ip_rcv+460/524>
Trace; 801fb088 <ip_rcv+27c/524>
Trace; 801136e4 <__run_task_queue+c8/e4>
Trace; 80123060 <generic_file_read+94/1a0>
Trace; 80122f58 <file_read_actor+0/74>
Trace; 8016716c <nfs_file_read+cc/ec>
Trace; 80112e8c <do_softirq+bc/188>
Trace; 8010631c <handle_IRQ_event+80/f4>
Trace; 80131bec <sys_read+d8/130>
Trace; 801319e8 <sys_lseek+98/b8>
Trace; 80106630 <do_IRQ+f0/114>
Trace; 801065ec <do_IRQ+ac/114>
Trace; 801057e8 <stack_done+1c/38>
Trace; 8019604c <ll_pri_enet_irq+c/14>
Code; 801289ac <kmem_cache_alloc+ac/1ac>
00000000 <_PC>:
Code; 801289ac <kmem_cache_alloc+ac/1ac>
0: 00000000 nop
Code; 801289b0 <kmem_cache_alloc+b0/1ac>
4: 8d020004 lw v0,4(t0)
Code; 801289b4 <kmem_cache_alloc+b4/1ac>
8: 8d030000 lw v1,0(t0)
Code; 801289b8 <kmem_cache_alloc+b8/1ac> <=====
c: ac620004 sw v0,4(v1) <=====
Code; 801289bc <kmem_cache_alloc+bc/1ac>
10: ac430000 sw v1,0(v0)
Code; 801289c0 <kmem_cache_alloc+c0/1ac>
14: 8e040008 lw a0,8(s0)
Code; 801289c4 <kmem_cache_alloc+c4/1ac>
18: ac880004 sw t0,4(a0)
Code; 801289c8 <kmem_cache_alloc+c8/1ac>
1c: ad040000 sw a0,0(t0)
Code; 801289cc <kmem_cache_alloc+cc/1ac>
20: ad050004 sw a1,4(t0)
2 warnings issued. Results may not be reliable.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Help with OOPSes, anyone?
2002-01-27 8:22 Help with OOPSes, anyone? Matthew Dharm
@ 2002-01-27 10:53 ` Jason Gunthorpe
2002-01-27 22:26 ` Matthew Dharm
2002-01-27 17:33 ` Pete Popov
1 sibling, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2002-01-27 10:53 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
On Sun, 27 Jan 2002, Matthew Dharm wrote:
> My instincts are telling me that these are all being caused by the same
> problem, but I'll be damned if I can figure out what that is. Caching is a
> good suspect, but that's just because it's always a good suspect.
I can tell you that I have managed to get 2.4.17 (patched up from the
2.4.15 in the linux_2_4 branch of SGI CVS) running very solidly on a
RM7000 platform. I have carefully inspected the cache code, and I
think that what is in the CVS tree is correct, though a little
over-zealous :> I had to make some tweaks to the cache init on the RM7k,
the existing code is wrong - but this is only important if your PROM does
not do it for you. I can send you this code if you like.
I'm using the Debian user land, 8M of L3 and a custom system controller.
The machine works will enough to build complicated programs, run X stuff,
etc. My board also has 512M of ram, (mapped from 0-512M, so no problems
with highmem..). The box is nfs root'd and I've currently got a 8139
ethernet chip on it.
> In these OOPSes, one is caused by some code in unaligned.c -- I've seen
> several (many) like this, tho I only captured and decoded one. The code in
Many of the oops's I've seen (while gettings this working) come from
unaligned.c - haven't investigated why yet - they might actually be kernel
unaligned memory references.
While working on the SR7100, I noticed that various sorts of problems that
result in a subtly broken system bus caused random faults in unaligned.c
> -- I FTPed the SRPM for wget and built it without any problems. Heck, it
> even works! But when I try to build something bigger (say, ncftp or
> glibc), it dies an ugly death. Heck, I could FTP, build, and use ksymoops
Just tried for you:
mips:/tmp/ram# apt-get source -b ncftp
[..]
dpkg-deb: building package `ncftp' in `../ncftp_3.1.1-3_mipsel.deb'.
mips:/tmp/ram# uname -a
Linux mips 2.4.15-greased-turkey #407 Thu Jan 17 19:20:18 MST 2002 mips unknown
mips:/tmp/ram# cat /proc/cpuinfo
processor : 0
cpu model : RM7000 V3.2 FPU V2.0
BogoMIPS : 346.20
[..]
mips:/tmp/ram# free
total used free shared buffers cached
Mem: 514100 124996 389104 0 16 98604
> hopes that will fix the problem. I'm thinking about trying
> CONFIG_MIPS_UNCACHED, but I don't know if that works on an RM7000 processor
It does.
> -- the L1 and L2 are built-in to the processor, and I don't think the L1
> can be deactivated. Then again, I don't know how CONFIG_MIPS_UNCACHED
They can.. It is worth trying without the L3 cache at the very least.
I see your boards have the GT system controllers. You may want to validate
they are configured correctly, you can get all sorts of really screwy
results if they are not - there are lots of errata for those chips, and
some models have a very intolerant (electricaly) sdram controller.
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Help with OOPSes, anyone?
2002-01-27 10:53 ` Jason Gunthorpe
@ 2002-01-27 22:26 ` Matthew Dharm
2002-01-28 2:39 ` Jason Gunthorpe
0 siblings, 1 reply; 18+ messages in thread
From: Matthew Dharm @ 2002-01-27 22:26 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: linux-mips
Interesting... did you try the 2.4.17 that's in the SGI CVS? That's what
I'm using....
Our PROM does configure the cache for us, but I'd like to see the code
anyway. Might be insightful.
We're pretty sure our GT-64120 is setup properly, as we use the same
parameters under vxWorks and OpenBSD without problem.
The particular board I'm using has no L3 on it. But I will try the
NONCACHED option to see what happens.
Matt
On Sun, Jan 27, 2002 at 03:53:34AM -0700, Jason Gunthorpe wrote:
>
> On Sun, 27 Jan 2002, Matthew Dharm wrote:
>
> > My instincts are telling me that these are all being caused by the same
> > problem, but I'll be damned if I can figure out what that is. Caching is a
> > good suspect, but that's just because it's always a good suspect.
>
> I can tell you that I have managed to get 2.4.17 (patched up from the
> 2.4.15 in the linux_2_4 branch of SGI CVS) running very solidly on a
> RM7000 platform. I have carefully inspected the cache code, and I
> think that what is in the CVS tree is correct, though a little
> over-zealous :> I had to make some tweaks to the cache init on the RM7k,
> the existing code is wrong - but this is only important if your PROM does
> not do it for you. I can send you this code if you like.
>
> I'm using the Debian user land, 8M of L3 and a custom system controller.
> The machine works will enough to build complicated programs, run X stuff,
> etc. My board also has 512M of ram, (mapped from 0-512M, so no problems
> with highmem..). The box is nfs root'd and I've currently got a 8139
> ethernet chip on it.
>
> > In these OOPSes, one is caused by some code in unaligned.c -- I've seen
> > several (many) like this, tho I only captured and decoded one. The code in
>
> Many of the oops's I've seen (while gettings this working) come from
> unaligned.c - haven't investigated why yet - they might actually be kernel
> unaligned memory references.
>
> While working on the SR7100, I noticed that various sorts of problems that
> result in a subtly broken system bus caused random faults in unaligned.c
>
> > -- I FTPed the SRPM for wget and built it without any problems. Heck, it
> > even works! But when I try to build something bigger (say, ncftp or
> > glibc), it dies an ugly death. Heck, I could FTP, build, and use ksymoops
>
> Just tried for you:
>
> mips:/tmp/ram# apt-get source -b ncftp
> [..]
> dpkg-deb: building package `ncftp' in `../ncftp_3.1.1-3_mipsel.deb'.
> mips:/tmp/ram# uname -a
> Linux mips 2.4.15-greased-turkey #407 Thu Jan 17 19:20:18 MST 2002 mips unknown
> mips:/tmp/ram# cat /proc/cpuinfo
> processor : 0
> cpu model : RM7000 V3.2 FPU V2.0
> BogoMIPS : 346.20
> [..]
> mips:/tmp/ram# free
> total used free shared buffers cached
> Mem: 514100 124996 389104 0 16 98604
>
> > hopes that will fix the problem. I'm thinking about trying
> > CONFIG_MIPS_UNCACHED, but I don't know if that works on an RM7000 processor
>
> It does.
>
> > -- the L1 and L2 are built-in to the processor, and I don't think the L1
> > can be deactivated. Then again, I don't know how CONFIG_MIPS_UNCACHED
>
> They can.. It is worth trying without the L3 cache at the very least.
>
> I see your boards have the GT system controllers. You may want to validate
> they are configured correctly, you can get all sorts of really screwy
> results if they are not - there are lots of errata for those chips, and
> some models have a very intolerant (electricaly) sdram controller.
>
> Jason
>
>
>
--
Matthew Dharm Work: mdharm@momenco.com
Senior Software Designer, Momentum Computer
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with OOPSes, anyone?
2002-01-27 22:26 ` Matthew Dharm
@ 2002-01-28 2:39 ` Jason Gunthorpe
2002-01-28 23:31 ` Matthew Dharm
0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2002-01-28 2:39 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
On Sun, 27 Jan 2002, Matthew Dharm wrote:
> Interesting... did you try the 2.4.17 that's in the SGI CVS? That's what
> I'm using....
Hmm. Woops. Wrong CVS tag.
Right, 2.4.17 out of CVS w/ my patch set is working OK, at least it
compiles ncftp, and runs my usual gamut of things.
I'll send you my modified cache code..
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 2:39 ` Jason Gunthorpe
@ 2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:31 ` Matthew Dharm
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Matthew Dharm @ 2002-01-28 23:31 UTC (permalink / raw)
To: linux-mips
Well, here's the latest test results...
The 2.4.0 kernel from MontaVista seems to work just fine. Of course,
it doesn't have support for the full range of interrupts, but that's a
separate matter. But it doesn't crash under big compiles.
2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes longer, but that
may just be a function of it running so much slower. The BogoMIPS
drops by a factor of 100. Ouch.
So it doesn't look like a cache problem after all. And it does
suggest that something introduced between 2.4.0 and .17 is what broke
things. But what that is, I have no idea.
I'm going to try Jason's modified cache code just in case, but I doubt
that will change anything. We'll have to see, tho.
Does anyone have any other suggestions to try? I'm starting to wonder
if perhaps the PROM isn't setting up the SDRAM properly, but that
conflicts with the working 2.4.0 kernel -- the PROM is the same in
both cases, so I would expect a PROM error to affect both versions.
I'm running out of ideas here... anyone?
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
^ permalink raw reply [flat|nested] 18+ messages in thread* RE: Help with OOPSes, anyone?
2002-01-28 23:31 ` Matthew Dharm
@ 2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:50 ` Matthew Dharm
2002-01-28 23:54 ` Pete Popov
2 siblings, 0 replies; 18+ messages in thread
From: Matthew Dharm @ 2002-01-28 23:31 UTC (permalink / raw)
To: linux-mips
Well, here's the latest test results...
The 2.4.0 kernel from MontaVista seems to work just fine. Of course,
it doesn't have support for the full range of interrupts, but that's a
separate matter. But it doesn't crash under big compiles.
2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes longer, but that
may just be a function of it running so much slower. The BogoMIPS
drops by a factor of 100. Ouch.
So it doesn't look like a cache problem after all. And it does
suggest that something introduced between 2.4.0 and .17 is what broke
things. But what that is, I have no idea.
I'm going to try Jason's modified cache code just in case, but I doubt
that will change anything. We'll have to see, tho.
Does anyone have any other suggestions to try? I'm starting to wonder
if perhaps the PROM isn't setting up the SDRAM properly, but that
conflicts with the working 2.4.0 kernel -- the PROM is the same in
both cases, so I would expect a PROM error to affect both versions.
I'm running out of ideas here... anyone?
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:31 ` Matthew Dharm
@ 2002-01-28 23:50 ` Matthew Dharm
2002-01-28 23:50 ` Matthew Dharm
2002-01-28 23:54 ` Pete Popov
2 siblings, 1 reply; 18+ messages in thread
From: Matthew Dharm @ 2002-01-28 23:50 UTC (permalink / raw)
To: linux-mips
Well, Jason's modified cache code doesn't help either.
I'm still open to suggestions....
Ralf, are you in the same location as your board? I'm curious to hear
your opinion of this?
Matthew Dharm
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: owner-linux-mips@oss.sgi.com
> [mailto:owner-linux-mips@oss.sgi.com]On Behalf Of Matthew Dharm
> Sent: Monday, January 28, 2002 3:32 PM
> To: linux-mips@oss.sgi.com
> Subject: RE: Help with OOPSes, anyone?
>
>
> Well, here's the latest test results...
>
> The 2.4.0 kernel from MontaVista seems to work just fine.
> Of course,
> it doesn't have support for the full range of interrupts,
> but that's a
> separate matter. But it doesn't crash under big compiles.
>
> 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes longer, but that
> may just be a function of it running so much slower. The BogoMIPS
> drops by a factor of 100. Ouch.
>
> So it doesn't look like a cache problem after all. And it does
> suggest that something introduced between 2.4.0 and .17 is
> what broke
> things. But what that is, I have no idea.
>
> I'm going to try Jason's modified cache code just in case,
> but I doubt
> that will change anything. We'll have to see, tho.
>
> Does anyone have any other suggestions to try? I'm
> starting to wonder
> if perhaps the PROM isn't setting up the SDRAM properly, but that
> conflicts with the working 2.4.0 kernel -- the PROM is the same in
> both cases, so I would expect a PROM error to affect both versions.
>
> I'm running out of ideas here... anyone?
>
> Matt
>
> --
> Matthew D. Dharm Senior Software Designer
> Momentum Computer Inc. 1815 Aston Ave.
> Suite 107
> (760) 431-8663 X-115 Carlsbad, CA 92008-7310
> Momentum Works For You www.momenco.com
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 23:50 ` Matthew Dharm
@ 2002-01-28 23:50 ` Matthew Dharm
0 siblings, 0 replies; 18+ messages in thread
From: Matthew Dharm @ 2002-01-28 23:50 UTC (permalink / raw)
To: linux-mips
Well, Jason's modified cache code doesn't help either.
I'm still open to suggestions....
Ralf, are you in the same location as your board? I'm curious to hear
your opinion of this?
Matthew Dharm
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: owner-linux-mips@oss.sgi.com
> [mailto:owner-linux-mips@oss.sgi.com]On Behalf Of Matthew Dharm
> Sent: Monday, January 28, 2002 3:32 PM
> To: linux-mips@oss.sgi.com
> Subject: RE: Help with OOPSes, anyone?
>
>
> Well, here's the latest test results...
>
> The 2.4.0 kernel from MontaVista seems to work just fine.
> Of course,
> it doesn't have support for the full range of interrupts,
> but that's a
> separate matter. But it doesn't crash under big compiles.
>
> 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes longer, but that
> may just be a function of it running so much slower. The BogoMIPS
> drops by a factor of 100. Ouch.
>
> So it doesn't look like a cache problem after all. And it does
> suggest that something introduced between 2.4.0 and .17 is
> what broke
> things. But what that is, I have no idea.
>
> I'm going to try Jason's modified cache code just in case,
> but I doubt
> that will change anything. We'll have to see, tho.
>
> Does anyone have any other suggestions to try? I'm
> starting to wonder
> if perhaps the PROM isn't setting up the SDRAM properly, but that
> conflicts with the working 2.4.0 kernel -- the PROM is the same in
> both cases, so I would expect a PROM error to affect both versions.
>
> I'm running out of ideas here... anyone?
>
> Matt
>
> --
> Matthew D. Dharm Senior Software Designer
> Momentum Computer Inc. 1815 Aston Ave.
> Suite 107
> (760) 431-8663 X-115 Carlsbad, CA 92008-7310
> Momentum Works For You www.momenco.com
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:50 ` Matthew Dharm
@ 2002-01-28 23:54 ` Pete Popov
2002-01-29 0:09 ` Matthew Dharm
2002-01-29 2:03 ` Matthew Dharm
2 siblings, 2 replies; 18+ messages in thread
From: Pete Popov @ 2002-01-28 23:54 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
On Mon, 2002-01-28 at 15:31, Matthew Dharm wrote:
> Well, here's the latest test results...
>
> The 2.4.0 kernel from MontaVista seems to work just fine. Of course,
> it doesn't have support for the full range of interrupts, but that's a
> separate matter. But it doesn't crash under big compiles.
2.4.0 from MontaVista? Do you mean the very first release, which was
2.4.0-test9?
> 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes longer, but that
> may just be a function of it running so much slower. The BogoMIPS
> drops by a factor of 100. Ouch.
>
> So it doesn't look like a cache problem after all. And it does
> suggest that something introduced between 2.4.0 and .17 is what broke
> things. But what that is, I have no idea.
>
> I'm going to try Jason's modified cache code just in case, but I doubt
> that will change anything. We'll have to see, tho.
>
> Does anyone have any other suggestions to try? I'm starting to wonder
> if perhaps the PROM isn't setting up the SDRAM properly, but that
> conflicts with the working 2.4.0 kernel -- the PROM is the same in
> both cases, so I would expect a PROM error to affect both versions.
>
> I'm running out of ideas here... anyone?
If you're absolutely sure 2.4.0-test9 doesn't crash (you ran the test
"enough" times), perhaps you can start testing kernels between 2.4.0 and
2.4.17. And, you did get rid of the 'wait' instruction in 2.4.17,
right ;-)?
Pete
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 23:54 ` Pete Popov
@ 2002-01-29 0:09 ` Matthew Dharm
2002-01-29 0:22 ` Pete Popov
2002-01-29 2:03 ` Matthew Dharm
1 sibling, 1 reply; 18+ messages in thread
From: Matthew Dharm @ 2002-01-29 0:09 UTC (permalink / raw)
To: Pete Popov; +Cc: linux-mips
Frankly, I'm not entirely certain which version the Montavista kernel
is. We were supposed to be doing the software validation for
PMC-Sierra (who contracted to Montavista for the work), so this is one
of the later kernels from that process. But I really don't know
exactly which one...
As for the 'wait' thing... forgot to try that one. How does one go
about disabling the wait instruction, anyway?
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: Pete Popov [mailto:ppopov@pacbell.net]
> Sent: Monday, January 28, 2002 3:55 PM
> To: Matthew Dharm
> Cc: linux-mips
> Subject: RE: Help with OOPSes, anyone?
>
>
> On Mon, 2002-01-28 at 15:31, Matthew Dharm wrote:
> > Well, here's the latest test results...
> >
> > The 2.4.0 kernel from MontaVista seems to work just fine.
> Of course,
> > it doesn't have support for the full range of interrupts,
> but that's a
> > separate matter. But it doesn't crash under big compiles.
>
> 2.4.0 from MontaVista? Do you mean the very first release, which was
> 2.4.0-test9?
>
> > 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes
> longer, but that
> > may just be a function of it running so much slower. The BogoMIPS
> > drops by a factor of 100. Ouch.
> >
> > So it doesn't look like a cache problem after all. And it does
> > suggest that something introduced between 2.4.0 and .17
> is what broke
> > things. But what that is, I have no idea.
> >
> > I'm going to try Jason's modified cache code just in
> case, but I doubt
> > that will change anything. We'll have to see, tho.
> >
> > Does anyone have any other suggestions to try? I'm
> starting to wonder
> > if perhaps the PROM isn't setting up the SDRAM properly, but that
> > conflicts with the working 2.4.0 kernel -- the PROM is the same in
> > both cases, so I would expect a PROM error to affect both
> versions.
> >
> > I'm running out of ideas here... anyone?
>
> If you're absolutely sure 2.4.0-test9 doesn't crash (you
> ran the test
> "enough" times), perhaps you can start testing kernels
> between 2.4.0 and
> 2.4.17. And, you did get rid of the 'wait' instruction in 2.4.17,
> right ;-)?
>
> Pete
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-29 0:09 ` Matthew Dharm
@ 2002-01-29 0:22 ` Pete Popov
0 siblings, 0 replies; 18+ messages in thread
From: Pete Popov @ 2002-01-29 0:22 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
On Mon, 2002-01-28 at 16:09, Matthew Dharm wrote:
> Frankly, I'm not entirely certain which version the Montavista kernel
> is. We were supposed to be doing the software validation for
> PMC-Sierra (who contracted to Montavista for the work), so this is one
> of the later kernels from that process. But I really don't know
> exactly which one...
It's probably 2.4.2 based, but it could be 2.4.0-test9. On the target,
type "uname --all"
> As for the 'wait' thing... forgot to try that one. How does one go
> about disabling the wait instruction, anyway?
arch/mips/kernel/setup.c, in the function check_wait(), ifdef-out the
RM7000 case so that 'wait' is not available.
Pete
>
> Matt
>
> --
> Matthew D. Dharm Senior Software Designer
> Momentum Computer Inc. 1815 Aston Ave. Suite 107
> (760) 431-8663 X-115 Carlsbad, CA 92008-7310
> Momentum Works For You www.momenco.com
>
> > -----Original Message-----
> > From: Pete Popov [mailto:ppopov@pacbell.net]
> > Sent: Monday, January 28, 2002 3:55 PM
> > To: Matthew Dharm
> > Cc: linux-mips
> > Subject: RE: Help with OOPSes, anyone?
> >
> >
> > On Mon, 2002-01-28 at 15:31, Matthew Dharm wrote:
> > > Well, here's the latest test results...
> > >
> > > The 2.4.0 kernel from MontaVista seems to work just fine.
> > Of course,
> > > it doesn't have support for the full range of interrupts,
> > but that's a
> > > separate matter. But it doesn't crash under big compiles.
> >
> > 2.4.0 from MontaVista? Do you mean the very first release, which was
> > 2.4.0-test9?
> >
> > > 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes
> > longer, but that
> > > may just be a function of it running so much slower. The BogoMIPS
> > > drops by a factor of 100. Ouch.
> > >
> > > So it doesn't look like a cache problem after all. And it does
> > > suggest that something introduced between 2.4.0 and .17
> > is what broke
> > > things. But what that is, I have no idea.
> > >
> > > I'm going to try Jason's modified cache code just in
> > case, but I doubt
> > > that will change anything. We'll have to see, tho.
> > >
> > > Does anyone have any other suggestions to try? I'm
> > starting to wonder
> > > if perhaps the PROM isn't setting up the SDRAM properly, but that
> > > conflicts with the working 2.4.0 kernel -- the PROM is the same in
> > > both cases, so I would expect a PROM error to affect both
> > versions.
> > >
> > > I'm running out of ideas here... anyone?
> >
> > If you're absolutely sure 2.4.0-test9 doesn't crash (you
> > ran the test
> > "enough" times), perhaps you can start testing kernels
> > between 2.4.0 and
> > 2.4.17. And, you did get rid of the 'wait' instruction in 2.4.17,
> > right ;-)?
> >
> > Pete
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
2002-01-28 23:54 ` Pete Popov
2002-01-29 0:09 ` Matthew Dharm
@ 2002-01-29 2:03 ` Matthew Dharm
2002-01-29 2:29 ` Pete Popov
1 sibling, 1 reply; 18+ messages in thread
From: Matthew Dharm @ 2002-01-29 2:03 UTC (permalink / raw)
To: Pete Popov; +Cc: linux-mips
Well, more data.
2.4.0 won't boot. What I get on the serial console is:
PMON> boot 192.168.1.1:/tftpboot/mdharm/vmlinux-2.4.0
Loading file: 192.168.1.1:/tftpboot/mdharm/vmlinux-2.4.0 (elf)
0x80100000/1490576 + 0x8026be90/127504(z) + 4094 syms\
Entry address is 801005a8
PMON> g
Linux version 2.4.0 (mdharm@GoldenGate) (gcc version egcs-2.91.66
19990314 (egcs-1.1.2 release)) #1 Mon Jan 28 16:37:45 PST 2002
Determined physical RAM map:
memory: 08000000 @ 00000000 (usable)
On node 0 totalpages: 32768
zone(0): 4096 pages.
zone(1): 28672 pages.
zone(2): 0 pages.
Kernel command line:
and then it just stops. Go figure.
2.4.5 crashes, but in a different way. Now I get a stream of:
Got ibe at 2ab60b3c.
Instruction bus error, epc == 2ab60b3c, ra == 2ab5fcac
Over and over again. I should say the system didn't actually crash,
but every command I run at the shell gives a Segmentation Fault. And,
even when I'm not trying to run applications, the above message is
streaming out the serial port.
2.4.10 also does something bad. Rebuilding ncftp causes a bus error
during the compilation process. As with 2.4.5, no applications will
now run, but this time they all give "Illegal Instruction (core
dumped)". Nothing appeared on the serial console.
2.4.14 crashes with the following OOPS (decoded for your enjoyment):
ksymoops 2.4.0 on mips 2.4.14. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.14/ (default)
-m System.map (specified)
Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at virtual address 00000020,
epc == 80128228, ra == 80123c28
$0 : 00000000 1000ee88 00000177 81209a20 8129fa80 00000067 2ab06000
00000000
$8 : 90045401 1000001f 00000000 802635a8 00000000 8025b840 87407e90
81205b64
$16: 00000000 00000067 86f54e40 00000000 86dbc520 86f54e40 2ab6d000
00000177
$24: 00000001 006b6c30 86d20000 86d21dd8 7fd73d98
80123c28
epc : 80128228 Not tainted
Using defaults from ksymoops -t elf32-tradbigmips -a mips:3000
Status: 90045403
Cause : 00008008
Process cc1 (pid: 3465, stackpage=86d20000)
Stack: 80123b70 7fd75db8 7fd74dc8 7fd74db8 00000000 2ab6d000 86f54e40
86318140
00000000 86181db4 2ab6d000 8631815c 7fd73d98 80123c28 86d21e78
7fd74db8
0070c248 86d21f30 7fd71ca8 00000000 00000000 2ab6d000 86318140
00000000
86f54e40 86d21f30 00000000 80123df0 86f54720 86d20000 8631815c
00000045
86181db4 ffffffff 86f54e40 86d20000 8631815c 2ab6d000 86318140
8010fcb4
0000000b ...
Call Trace: [<80123b70>] [<80123c28>] [<80123df0>] [<8010fcb4>]
[<8011db50>] [<8011dc90>]
[<8011e0e0>] [<8010d76c>] [<80110b98>] [<8010a760>]
Warning (Oops_read): Code line not seen, dumping what data is
available
>>RA; 80123c28 <do_no_page+7c/1c0>
>>PC; 80128228 <filemap_nopage+64/304> <=====
Trace; 80123b70 <do_anonymous_page+110/14c>
Trace; 80123c28 <do_no_page+7c/1c0>
Trace; 80123df0 <handle_mm_fault+84/144>
Trace; 8010fcb4 <do_page_fault+17c/398>
Trace; 8011db50 <deliver_signal+24/90>
Trace; 8011dc90 <send_sig_info+d4/120>
Trace; 8011e0e0 <send_sig+18/24>
Trace; 8010d76c <do_IRQ+ac/114>
Trace; 80110b98 <nopage_tlbl+f4/fc>
Trace; 8010a760 <signal_return+1c/3c>
And 2.4.17 with the wait instruction turned off still crashes.
The Montavista kernel (which claims to be 2.4.0 #5 build by jsun)
seems to work... I've done several recompiles on it, and lots of I/O
traffic with no problems. Unfortunatly, I don't have the source code
to this particular kernel... tho I believe that Montavista is required
to release their source cod by the GPL.
Tho here's a question: What is the best compiler to build a kernel
with? I've built all mine with egcs-2.91.66 which I downloaded from
oss.sgi.com a while ago.
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: Pete Popov [mailto:ppopov@pacbell.net]
> Sent: Monday, January 28, 2002 3:55 PM
> To: Matthew Dharm
> Cc: linux-mips
> Subject: RE: Help with OOPSes, anyone?
>
>
> On Mon, 2002-01-28 at 15:31, Matthew Dharm wrote:
> > Well, here's the latest test results...
> >
> > The 2.4.0 kernel from MontaVista seems to work just fine.
> Of course,
> > it doesn't have support for the full range of interrupts,
> but that's a
> > separate matter. But it doesn't crash under big compiles.
>
> 2.4.0 from MontaVista? Do you mean the very first release, which was
> 2.4.0-test9?
>
> > 2.4.17 with CONFIG_MIPS_UNCACHED crashes. It takes
> longer, but that
> > may just be a function of it running so much slower. The BogoMIPS
> > drops by a factor of 100. Ouch.
> >
> > So it doesn't look like a cache problem after all. And it does
> > suggest that something introduced between 2.4.0 and .17
> is what broke
> > things. But what that is, I have no idea.
> >
> > I'm going to try Jason's modified cache code just in
> case, but I doubt
> > that will change anything. We'll have to see, tho.
> >
> > Does anyone have any other suggestions to try? I'm
> starting to wonder
> > if perhaps the PROM isn't setting up the SDRAM properly, but that
> > conflicts with the working 2.4.0 kernel -- the PROM is the same in
> > both cases, so I would expect a PROM error to affect both
> versions.
> >
> > I'm running out of ideas here... anyone?
>
> If you're absolutely sure 2.4.0-test9 doesn't crash (you
> ran the test
> "enough" times), perhaps you can start testing kernels
> between 2.4.0 and
> 2.4.17. And, you did get rid of the 'wait' instruction in 2.4.17,
> right ;-)?
>
> Pete
>
^ permalink raw reply [flat|nested] 18+ messages in thread* RE: Help with OOPSes, anyone?
2002-01-29 2:03 ` Matthew Dharm
@ 2002-01-29 2:29 ` Pete Popov
2002-01-29 6:18 ` Matthew Dharm
0 siblings, 1 reply; 18+ messages in thread
From: Pete Popov @ 2002-01-29 2:29 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
> And 2.4.17 with the wait instruction turned off still crashes.
> The Montavista kernel (which claims to be 2.4.0 #5 build by jsun)
> seems to work...
Strange, that must have been some interim build Jun did.
> I've done several recompiles on it, and lots of I/O
> traffic with no problems. Unfortunatly, I don't have the source code
> to this particular kernel... tho I believe that Montavista is required
> to release their source cod by the GPL.
All of the open source work we do we push out to the community tree
immediately. That's a rule we live by and there's no exceptions. The Ocelot
code was pushed out back then. Since then I've seen lots of additions to that
directory and obviously something got broke.
QED did receive an Alliance CD with the Ocelot LSP on it, so they should
be able to provide you with a copy of the cdimage, including the
source. But the kernel will be 2.4.2 based though -- I don't know where
the 2.4.0 came from.
> Tho here's a question: What is the best compiler to build a kernel
> with? I've built all mine with egcs-2.91.66 which I downloaded from
> oss.sgi.com a while ago.
MontaVista's, but I'm biased ;-) The toolchain will be on the CD as
well.
Pete
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with OOPSes, anyone?
2002-01-29 2:29 ` Pete Popov
@ 2002-01-29 6:18 ` Matthew Dharm
2002-01-29 16:47 ` Pete Popov
0 siblings, 1 reply; 18+ messages in thread
From: Matthew Dharm @ 2002-01-29 6:18 UTC (permalink / raw)
To: Pete Popov; +Cc: linux-mips
Do you know who, precisely, got the CD? I'm going to try to chase this
down, but exact names would be helpful.
Also, when you push that material out to the community, when did you do
that? That is, if your work is 2.4.2-based, is it reasonable to assume
that 2.4.2 or 2.4.3 in the CVS repository will work? Or do you take a more
"fire and forget" sort of approach?
Matthew Dharm
On Mon, Jan 28, 2002 at 06:29:22PM -0800, Pete Popov wrote:
>
> > And 2.4.17 with the wait instruction turned off still crashes.
>
> > The Montavista kernel (which claims to be 2.4.0 #5 build by jsun)
> > seems to work...
>
> Strange, that must have been some interim build Jun did.
>
> > I've done several recompiles on it, and lots of I/O
> > traffic with no problems. Unfortunatly, I don't have the source code
> > to this particular kernel... tho I believe that Montavista is required
> > to release their source cod by the GPL.
>
> All of the open source work we do we push out to the community tree
> immediately. That's a rule we live by and there's no exceptions. The Ocelot
> code was pushed out back then. Since then I've seen lots of additions to that
> directory and obviously something got broke.
>
> QED did receive an Alliance CD with the Ocelot LSP on it, so they should
> be able to provide you with a copy of the cdimage, including the
> source. But the kernel will be 2.4.2 based though -- I don't know where
> the 2.4.0 came from.
>
> > Tho here's a question: What is the best compiler to build a kernel
> > with? I've built all mine with egcs-2.91.66 which I downloaded from
> > oss.sgi.com a while ago.
>
> MontaVista's, but I'm biased ;-) The toolchain will be on the CD as
> well.
>
> Pete
>
--
Matthew Dharm Work: mdharm@momenco.com
Senior Software Designer, Momentum Computer
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with OOPSes, anyone?
2002-01-29 6:18 ` Matthew Dharm
@ 2002-01-29 16:47 ` Pete Popov
0 siblings, 0 replies; 18+ messages in thread
From: Pete Popov @ 2002-01-29 16:47 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips, Manoj Ekbote
On Mon, 2002-01-28 at 22:18, Matthew Dharm wrote:
> Do you know who, precisely, got the CD? I'm going to try to chase this
> down, but exact names would be helpful.
I believe Manoj has access to it and I added him to the CC without
asking ;-)
> Also, when you push that material out to the community, when did you do
> that? That is, if your work is 2.4.2-based, is it reasonable to assume
> that 2.4.2 or 2.4.3 in the CVS repository will work?
No, not necessarily. Sometimes I push out patches for new boards even
if the support is not fully baked yet. But typically I wait until I've
got something useful going. That means that the work might have been
done locally, on a 2.4.2 snapshot, and submitted once it was stable.
However, oss might have been up to 2.4.5 at that point; that, and
whatever delay is introduced because Ralf is too busy to immediately
apply the patches means that a MontaVista 2.4.2 based release does not
necessarily equal oss 2.4.2.
> Or do you take a more "fire and forget" sort of approach?
I'm not sure what you mean by that. Hopefully the above is clear
enough.
Pete
> Matthew Dharm
>
> On Mon, Jan 28, 2002 at 06:29:22PM -0800, Pete Popov wrote:
> >
> > > And 2.4.17 with the wait instruction turned off still crashes.
> >
> > > The Montavista kernel (which claims to be 2.4.0 #5 build by jsun)
> > > seems to work...
> >
> > Strange, that must have been some interim build Jun did.
> >
> > > I've done several recompiles on it, and lots of I/O
> > > traffic with no problems. Unfortunatly, I don't have the source code
> > > to this particular kernel... tho I believe that Montavista is required
> > > to release their source cod by the GPL.
> >
> > All of the open source work we do we push out to the community tree
> > immediately. That's a rule we live by and there's no exceptions. The Ocelot
> > code was pushed out back then. Since then I've seen lots of additions to that
> > directory and obviously something got broke.
> >
> > QED did receive an Alliance CD with the Ocelot LSP on it, so they should
> > be able to provide you with a copy of the cdimage, including the
> > source. But the kernel will be 2.4.2 based though -- I don't know where
> > the 2.4.0 came from.
> >
> > > Tho here's a question: What is the best compiler to build a kernel
> > > with? I've built all mine with egcs-2.91.66 which I downloaded from
> > > oss.sgi.com a while ago.
> >
> > MontaVista's, but I'm biased ;-) The toolchain will be on the CD as
> > well.
> >
> > Pete
> >
>
> --
> Matthew Dharm Work: mdharm@momenco.com
> Senior Software Designer, Momentum Computer
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with OOPSes, anyone?
2002-01-27 8:22 Help with OOPSes, anyone? Matthew Dharm
2002-01-27 10:53 ` Jason Gunthorpe
@ 2002-01-27 17:33 ` Pete Popov
2002-01-27 22:24 ` Matthew Dharm
1 sibling, 1 reply; 18+ messages in thread
From: Pete Popov @ 2002-01-27 17:33 UTC (permalink / raw)
To: Matthew Dharm; +Cc: linux-mips
> But, under certain conditions, the kernel OOPSes. Attached to this message
> are a few of those OOPSes (serial console is wonderful!) along with the
> ksymoops output. I think the read_lsmod() warning is bogus, because there
> are, actually, no modules loaded.
>
> My instincts are telling me that these are all being caused by the same
> problem, but I'll be damned if I can figure out what that is. Caching is a
> good suspect, but that's just because it's always a good suspect.
Native compiles have indeed proven a great way to shake out hardware and
software bugs.
One suggestion. The rm7k, at least some of the silicon versions, have
hardware erratas with the 'wait' instruction, used in the cpu_idle()
loop. The CPU I have on one of the EV96100 boards, in combination with
the gt96100, will hang hard every time if I don't disable the use of
'wait'. So while this bug might not have anything to do with what
you're observing, I would ifdef-out the 'wait' instruction in
check_wait(), just to be sure that that's not the cause or one of the
problems.
Pete
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Help with OOPSes, anyone?
2002-01-27 17:33 ` Pete Popov
@ 2002-01-27 22:24 ` Matthew Dharm
0 siblings, 0 replies; 18+ messages in thread
From: Matthew Dharm @ 2002-01-27 22:24 UTC (permalink / raw)
To: Pete Popov; +Cc: linux-mips
Well, we're using very late RM7000 silicon, so I doubt that's the problem.
But it's a good thing to look at, anyway.
Tho it kinda conflicts with the datapoint that we actually had a stable
kernel on this hardware before. Tho, like I said, that's not much of a
datapoint -- more testing coming!
Matt
On Sun, Jan 27, 2002 at 09:33:02AM -0800, Pete Popov wrote:
>
> > But, under certain conditions, the kernel OOPSes. Attached to this message
> > are a few of those OOPSes (serial console is wonderful!) along with the
> > ksymoops output. I think the read_lsmod() warning is bogus, because there
> > are, actually, no modules loaded.
> >
> > My instincts are telling me that these are all being caused by the same
> > problem, but I'll be damned if I can figure out what that is. Caching is a
> > good suspect, but that's just because it's always a good suspect.
>
> Native compiles have indeed proven a great way to shake out hardware and
> software bugs.
>
> One suggestion. The rm7k, at least some of the silicon versions, have
> hardware erratas with the 'wait' instruction, used in the cpu_idle()
> loop. The CPU I have on one of the EV96100 boards, in combination with
> the gt96100, will hang hard every time if I don't disable the use of
> 'wait'. So while this bug might not have anything to do with what
> you're observing, I would ifdef-out the 'wait' instruction in
> check_wait(), just to be sure that that's not the cause or one of the
> problems.
>
> Pete
--
Matthew Dharm Work: mdharm@momenco.com
Senior Software Designer, Momentum Computer
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: Help with OOPSes, anyone?
@ 2002-01-29 18:27 Manoj Ekbote
0 siblings, 0 replies; 18+ messages in thread
From: Manoj Ekbote @ 2002-01-29 18:27 UTC (permalink / raw)
To: 'Pete Popov', Matthew Dharm; +Cc: linux-mips
Yes,I have access to the CD that has Montavista's kernel(2.4.2) on it.
Manoj
> -----Original Message-----
> From: Pete Popov [mailto:ppopov@pacbell.net]
> Sent: Tuesday, January 29, 2002 8:47 AM
> To: Matthew Dharm
> Cc: linux-mips; Manoj Ekbote
> Subject: Re: Help with OOPSes, anyone?
>
>
> On Mon, 2002-01-28 at 22:18, Matthew Dharm wrote:
> > Do you know who, precisely, got the CD? I'm going to try
> to chase this
> > down, but exact names would be helpful.
>
> I believe Manoj has access to it and I added him to the CC without
> asking ;-)
>
> > Also, when you push that material out to the community,
> when did you do
> > that? That is, if your work is 2.4.2-based, is it
> reasonable to assume
> > that 2.4.2 or 2.4.3 in the CVS repository will work?
>
> No, not necessarily. Sometimes I push out patches for new
> boards even
> if the support is not fully baked yet. But typically I wait until I've
> got something useful going. That means that the work might have been
> done locally, on a 2.4.2 snapshot, and submitted once it was stable.
> However, oss might have been up to 2.4.5 at that point; that, and
> whatever delay is introduced because Ralf is too busy to immediately
> apply the patches means that a MontaVista 2.4.2 based release does not
> necessarily equal oss 2.4.2.
>
> > Or do you take a more "fire and forget" sort of approach?
>
> I'm not sure what you mean by that. Hopefully the above is clear
> enough.
>
> Pete
>
> > Matthew Dharm
> >
> > On Mon, Jan 28, 2002 at 06:29:22PM -0800, Pete Popov wrote:
> > >
> > > > And 2.4.17 with the wait instruction turned off still crashes.
> > >
> > > > The Montavista kernel (which claims to be 2.4.0 #5
> build by jsun)
> > > > seems to work...
> > >
> > > Strange, that must have been some interim build Jun did.
> > >
> > > > I've done several recompiles on it, and lots of I/O
> > > > traffic with no problems. Unfortunatly, I don't have
> the source code
> > > > to this particular kernel... tho I believe that
> Montavista is required
> > > > to release their source cod by the GPL.
> > >
> > > All of the open source work we do we push out to the
> community tree
> > > immediately. That's a rule we live by and there's no
> exceptions. The Ocelot
> > > code was pushed out back then. Since then I've seen lots
> of additions to that
> > > directory and obviously something got broke.
> > >
> > > QED did receive an Alliance CD with the Ocelot LSP on it,
> so they should
> > > be able to provide you with a copy of the cdimage, including the
> > > source. But the kernel will be 2.4.2 based though -- I
> don't know where
> > > the 2.4.0 came from.
> > >
> > > > Tho here's a question: What is the best compiler to
> build a kernel
> > > > with? I've built all mine with egcs-2.91.66 which I
> downloaded from
> > > > oss.sgi.com a while ago.
> > >
> > > MontaVista's, but I'm biased ;-) The toolchain will be
> on the CD as
> > > well.
> > >
> > > Pete
> > >
> >
> > --
> > Matthew Dharm Work: mdharm@momenco.com
> > Senior Software Designer, Momentum Computer
> >
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2002-01-29 20:33 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-27 8:22 Help with OOPSes, anyone? Matthew Dharm
2002-01-27 10:53 ` Jason Gunthorpe
2002-01-27 22:26 ` Matthew Dharm
2002-01-28 2:39 ` Jason Gunthorpe
2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:31 ` Matthew Dharm
2002-01-28 23:50 ` Matthew Dharm
2002-01-28 23:50 ` Matthew Dharm
2002-01-28 23:54 ` Pete Popov
2002-01-29 0:09 ` Matthew Dharm
2002-01-29 0:22 ` Pete Popov
2002-01-29 2:03 ` Matthew Dharm
2002-01-29 2:29 ` Pete Popov
2002-01-29 6:18 ` Matthew Dharm
2002-01-29 16:47 ` Pete Popov
2002-01-27 17:33 ` Pete Popov
2002-01-27 22:24 ` Matthew Dharm
-- strict thread matches above, loose matches on Subject: below --
2002-01-29 18:27 Manoj Ekbote
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox