* [B.A.T.M.A.N.] Batman gateway lock ups
@ 2008-09-05 15:01 Outback Dingo
2008-09-08 7:08 ` Sven Eckelmann
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Outback Dingo @ 2008-09-05 15:01 UTC (permalink / raw)
To: b.a.t.m.a.n
[-- Attachment #1: Type: text/plain, Size: 2297 bytes --]
see pastebin
http://www.pastebin.ca/1194874
pertinent info
dmesg | grep 'batgat loaded'
batgat: [init_module:96] batgat loaded rv1025
uname -a
Linux nightwing 2.6.23.16 #16 Tue Apr 22 20:00:17 ART 2008 mips unknown
root@nightwing:~# batmand -v
WARNING: You are using the unstable batman branch. If you are interested in
*using* batman get the latest stable release !
B.A.T.M.A.N. 0.3-beta (compatibility version 5)
lsmod
Module Size Used by Tainted:
P
sch_htb 14048
2
ath_ahb 103616
0
wlan_xauth 480
0
wlan_wep 4000
0
wlan_tkip 9856
0
wlan_ccmp 5440
2
wlan_acl 1920 0
ath_rate_minstrel 8352 1
ath_hal 136832 3 ath_ahb,ath_rate_minstrel
wlan_scan_sta 8768 1
wlan_scan_ap 6656 0
wlan 152464 10
ath_ahb,wlan_xauth,wlan_wep,wlan_tkip,wlan_ccmp,wlan_acl,ath_rate_minstrel,wlan_scan_sta,wlan_scan_ap
batgat 10944 1
ipt_iprange 672 0
ipt_TOS 832 0
ipt_TTL 928 0
xt_MARK 960 3
ipt_ECN 1472 0
xt_CLASSIFY 640 0
ipt_ttl 704 0
ipt_tos 544 0
ipt_time 1568 0
xt_tcpmss 1088 0
xt_statistic 832 0
xt_mark 672 7
xt_mac 736 3
xt_length 736 0
ipt_ecn 1024 0
xt_DSCP 1056 0
xt_dscp 832 0
imq 2096 0
ipt_IMQ 672 2
xt_string 896 0
xt_layer7 9840 0
ipt_ipp2p 6784 0
ipt_LOG 4640 0
xt_CHAOS 1792 0
xt_DELUDE 2624 1
xt_TARPIT 2816 1
xt_quota 800 0
xt_portscan 2016 0
xt_pkttype 704 0
xt_physdev 1488 0
ipt_owner 800 0
iptable_raw 832 0
xt_NOTRACK 832 0
xt_CONNMARK 1088 0
ipt_recent 4992 0
xt_helper 992 0
xt_conntrack 1312 0
xt_connmark 832 0
xt_connbytes 1312 0
tun 6592 0
[-- Attachment #2: Type: text/html, Size: 9985 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-05 15:01 [B.A.T.M.A.N.] Batman gateway lock ups Outback Dingo @ 2008-09-08 7:08 ` Sven Eckelmann 2008-09-08 21:18 ` Sven Eckelmann 2009-07-21 20:48 ` Simon Wunderlich 2 siblings, 0 replies; 9+ messages in thread From: Sven Eckelmann @ 2008-09-08 7:08 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 1742 bytes --] On Friday 05 September 2008 17:01:24 Outback Dingo wrote: > [...] > http://www.pastebin.ca/1194874 > > pertinent info > dmesg | grep 'batgat loaded' > batgat: [init_module:96] batgat loaded rv1025 > uname -a > Linux nightwing 2.6.23.16 #16 Tue Apr 22 20:00:17 ART 2008 mips unknown > root@nightwing:~# batmand -v > WARNING: You are using the unstable batman branch. If you are interested in > *using* batman get the latest stable release ! > B.A.T.M.A.N. 0.3-beta (compatibility version 5) > [..] Thx for your report. I looked a little bit at the the call stack but it is really hard to guess the functions when you don't have the symbol table. It looks a little bit like most of the functions are kernel thread/scheduling related. Only one function <c00c8650> is from a module. Can you please send the /proc/modules file so we can find in which module it is so we can check if it could be a batgat related. Further investigations could be done by checking the symbols of batgat.ko with nm - but this would not help very much because it is stripped down. Maybe someone else has a good idea. Now to something you said on the nightwing mailing list: > i think the version of batgat being used has a bug as confirmed by someone in > the batman irc room, note only gateways have this issue, clients do not > crash at all. If you mean me (Lazhur on irc) then I have to say that I am not a developer nor a spokesman of b.a.t.m.a.n. and I never confirmed that it is batgat related - only that something crashed/lock up. I cannot find any crash related bug fixes in the batgat trunk directory - so it doesn't seem to be a known problem (if it is a batgat problem at all). Best regards Sven Eckelmann [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-05 15:01 [B.A.T.M.A.N.] Batman gateway lock ups Outback Dingo 2008-09-08 7:08 ` Sven Eckelmann @ 2008-09-08 21:18 ` Sven Eckelmann 2008-09-08 21:45 ` Sven Eckelmann 2008-09-09 11:26 ` Simon Wunderlich 2009-07-21 20:48 ` Simon Wunderlich 2 siblings, 2 replies; 9+ messages in thread From: Sven Eckelmann @ 2008-09-08 21:18 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 2135 bytes --] Ok, I got the /proc/modules file now. Current situation is following: it crashes inside the the batman module add position 0x00000aa4 a60: 3c020000 lui v0,0x0 a64: 8c500024 lw s0,36(v0) a68: 24420024 addiu v0,v0,36 a6c: 12020014 beq s0,v0,ac0 <cleanup_module+0x610> a70: 3c040000 lui a0,0x0 a74: 3c050000 lui a1,0x0 a78: 3c020000 lui v0,0x0 a7c: 24840000 addiu a0,a0,0 a80: 24a50088 addiu a1,a1,136 a84: 24420000 addiu v0,v0,0 a88: 0040f809 jalr v0 a8c: 24060283 li a2,643 a90: 8e040004 lw a0,4(s0) a94: 8e030000 lw v1,0(s0) a98: 3c020010 lui v0,0x10 a9c: 34420100 ori v0,v0,0x100 aa0: 8e110008 lw s1,8(s0) aa4: ac830000 sw v1,0(a0) aa8: ae020000 sw v0,0(s0) aac: 3c020020 lui v0,0x20 ab0: 34420200 ori v0,v0,0x200 ab4: ac640004 sw a0,4(v1) This is part of the compiled version of packet_recv_thread. Due the optimizations done I cannot say were exactly the problem lies. I think the code of get_ip_addr() got inlined in packet_recv_thread and we need to search for the crash inside of it at list_del(&entry->list); I would also say that the really crash is inside __list_del where prev and next will be set. To check it, look at LIST_POISON1 and LIST_POISON1 inside of poison.h of the current linux kernel. You will notice that the values are 0x00100100 and 0x00200200 == address of the failed paging request. The list poison stuff will be done in in list_del after calling __list_del (it is the sequence lui, ori, sw in the asm snipped). So could it be that we have a poisened entry inside the list? This could for example happen when we get scheduled (please notice that the optimizer exchanged many instrictions) while another part of the program is deleting entries. I haven't checked the rest of the code if that really could happen, but that is my current idea. So for better readability the callstack: - packet_recv_thread - get_ip_addr from gateway.c:401 - list_del from gateway.c:645 - __list_del Best regards Sven Eckelmann [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-08 21:18 ` Sven Eckelmann @ 2008-09-08 21:45 ` Sven Eckelmann 2008-09-09 8:03 ` Sven Eckelmann 2008-09-09 11:26 ` Simon Wunderlich 1 sibling, 1 reply; 9+ messages in thread From: Sven Eckelmann @ 2008-09-08 21:45 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1.1: Type: text/plain, Size: 502 bytes --] On Monday 08 September 2008 23:18:42 Sven Eckelmann wrote: > Ok, I got the /proc/modules file now. Current situation is following: it > crashes inside the the batman module add position 0x00000aa4 > [..] I got the System.map right now. So we can convert the kernel oops into something more readable. It doesn't help that much now, but... just for sake of completeness END_OF_CODE+3fe1d020 is the batgat.ko when we search for c00c8650 inside /proc/modules Best regards Sven Eckelmann [-- Attachment #1.2: informative_ooops.txt --] [-- Type: text/plain, Size: 1672 bytes --] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == c00c8aa4, ra == c00c8a90 Cpu 0 $ 0 : 00000000 10009c00 00100100 802bb600 $ 4 : 00000000 00000001 00000000 00000000 $ 8 : 00000000 806e7a28 00000007 a106af00 $12 : 00000007 270e0000 000006d0 7d96d080 $16 : 80a23500 00000000 c00c9a28 00000064 $20 : c00d0000 00000000 000b0f4f 8071193d $24 : 80711730 00008000 $28 : 80710000 80711890 00000000 c00c8a90 Hi : 00000140 Lo : 68fdd3c0 epc : c00c8aa4 Tainted: P Cause : 3080000c 807119a0 000005dc 8071193d 000004c0 000210d2 05a82b1a 00000000 00000000 00020000 185352d0 2ac668cc 2ac668d4 000210d2 00000000 2ac668dc 2ac668e4 00000000 806e79f8 8007d41c 807118fc 807118fc 807118c0 00000010 807118b0 00000001 00000000 00000000 00004040 807118d0 00000010 807118b8 00000001 Call Trace:[<8007d41c>][<8005e5f0>][<8005cd04>][<8005e8d4>][<8005e0e4>][<8005cd38>][<8005d578>][<8007d0b0>][<8022656c>][<c00c8650>][<8007d108>][<8007d0e8>][<80045698>][<80045688>] Code: 3c020010 34420100 8e110008 <ac830000> ae020000 3c020020 34420200 ac640004 16200011 Trace; 8007d41c <autoremove_wake_function+0/44> Trace; 8005e5f0 <enqueue_entity+2fc/33c> Trace; 8005cd04 <enqueue_task+1c/34> Trace; 8005e8d4 <dequeue_entity+98/d8> Trace; 8005e0e4 <try_to_wake_up+84/d8> Trace; 8005cd38 <dequeue_task+1c/30> Trace; 8005d578 <pick_next_task_fair+38/78> Trace; 8007d0b0 <kthread+0/b0> Trace; 8022656c <schedule+1e0/7d4> Trace; c00c8650 <END_OF_CODE+3fe1d020/????> Trace; 8007d108 <kthread+58/b0> Trace; 8007d0e8 <kthread+38/b0> Trace; 80045698 <kernel_thread_helper+10/18> Trace; 80045688 <kernel_thread_helper+0/18> [-- Attachment #1.3: proc_modules --] [-- Type: text/plain, Size: 1954 bytes --] sch_htb 14048 2 - Live 0xc00e5000 ath_ahb 103616 0 - Live 0xc0150000 wlan_xauth 480 0 - Live 0xc00dc000 wlan_wep 4000 0 - Live 0xc00cf000 wlan_tkip 9856 0 - Live 0xc00d8000 wlan_ccmp 5440 3 - Live 0xc00d5000 wlan_acl 1920 0 - Live 0xc00af000 ath_rate_minstrel 8352 1 - Live 0xc00d1000 ath_hal 136832 3 ath_ahb,ath_rate_minstrel, Live 0xc012d000 (P) wlan_scan_sta 8768 1 - Live 0xc00c1000 wlan_scan_ap 6656 0 - Live 0xc00c5000 wlan 152464 10 ath_ahb,wlan_xauth,wlan_wep,wlan_tkip,wlan_ccmp,wlan_acl,ath_rate_minstrel,wlan_scan_sta,wlan_scan_ap,Live 0xc0106000 batgat 10944 1 - Live 0xc00c8000 ipt_iprange 672 0 - Live 0xc00bf000 ipt_TOS 832 0 - Live 0xc00bd000 ipt_TTL 928 0 - Live 0xc00bb000 xt_MARK 960 0 - Live 0xc00b9000 ipt_ECN 1472 0 - Live 0xc00b7000 xt_CLASSIFY 640 0 - Live 0xc00b5000 ipt_ttl 704 0 - Live 0xc00b3000 ipt_tos 544 0 - Live 0xc00b1000 ipt_time 1568 0 - Live 0xc009d000 xt_tcpmss 1088 0 - Live 0xc00ad000 xt_statistic 832 0 - Live 0xc00ab000 xt_mark 672 7 - Live 0xc00a9000 xt_mac 736 0 - Live 0xc00a7000 xt_length 736 0 - Live 0xc00a5000 ipt_ecn 1024 0 - Live 0xc00a3000 xt_DSCP 1056 0 - Live 0xc00a1000 xt_dscp 832 0 - Live 0xc009f000 imq 2096 0 - Live 0xc0097000 ipt_IMQ 672 2 - Live 0xc009b000 xt_string 896 0 - Live 0xc0099000 xt_layer7 9840 0 - Live 0xc008f000 ipt_ipp2p 6784 0 - Live 0xc0094000 ipt_LOG 4640 0 - Live 0xc0088000 xt_CHAOS 1792 0 - Live 0xc008d000 xt_DELUDE 2624 1 - Live 0xc008b000 xt_TARPIT 2816 1 - Live 0xc0084000 xt_quota 800 0 - Live 0xc0086000 xt_portscan 2016 0 - Live 0xc0066000 xt_pkttype 704 0 - Live 0xc0082000 xt_physdev 1488 0 - Live 0xc0080000 ipt_owner 800 0 - Live 0xc007e000 iptable_raw 832 0 - Live 0xc007c000 xt_NOTRACK 832 0 - Live 0xc0076000 xt_CONNMARK 1088 0 - Live 0xc0074000 ipt_recent 4992 0 - Live 0xc0079000 xt_helper 992 0 - Live 0xc0072000 xt_conntrack 1312 0 - Live 0xc0070000 xt_connmark 832 0 - Live 0xc006e000 xt_connbytes 1312 0 - Live 0xc0068000 tun 6592 0 - Live 0xc006b000 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-08 21:45 ` Sven Eckelmann @ 2008-09-09 8:03 ` Sven Eckelmann 0 siblings, 0 replies; 9+ messages in thread From: Sven Eckelmann @ 2008-09-09 8:03 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1.1: Type: text/plain, Size: 351 bytes --] On Monday 08 September 2008 23:45:55 Sven Eckelmann wrote: > I got the System.map right now. So we can convert the kernel oops into > something more readable. It doesn't help that much now, but... just for > sake of completeness Sry, forgot the second oops with the interesting address of the paging failure. Best regards Sven Eckelmann [-- Attachment #1.2: informative_ooops2.txt --] [-- Type: text/plain, Size: 4199 bytes --] CPU 0 Unable to handle kernel paging request at virtual address 00200200, epc == c00c8aa4, ra == c00c8a90 Cpu 0 $ 0 : 00000000 10009c00 00100100 00100100 $ 4 : 00200200 00000001 00000000 00000000 $ 8 : 00000000 8071aa28 0000000b 127a3980 $12 : 0000000b ebc20000 0000045d 67350e80 $16 : 80ac1600 00000000 c00c9a28 00000064 $20 : c00d0000 00000000 0006ab6e 8071d93d $24 : 8071d730 00008000 $28 : 8071c000 8071d890 00000000 c00c8a90 Hi : 00000140 Lo : 68fdd3c0 epc : c00c8aa4 Tainted: P Cause : 3080000c 8071d9a0 000005dc 8071d93d 00000054 000210d2 05a82b6e 00000000 00000000 00020000 c00505f1 8026dd80 8071db60 000210d2 00000000 00000000 801ca5e8 00000000 8071a9f8 8007d41c 8071d8fc 8071d8fc 8071d8c0 00000010 8071d8b0 00000001 00000000 00000000 00004040 8071d8d0 00000010 8071d8b8 00000001 Call Trace:[<801ca5e8>][<8007d41c>][<801c018c>][<801ba0d0>][<801bbf74>][<8020f0d4>][<8020f95c>][<8020f95c>][<c015b2f4>][<c0161e80>][<8008bfe0>][<8008dfac>][<801ca110>][<800431e8>][<800437a4>][<c0106840>][<80050000>][<c01549f0>][<c015f7b0>][<c015f7f0>][<c0161e80>][<80079f5c>][<8006f8d0>][<8008bfe0>][<8006b778>][<8006b1e0>][<8006b2c4>][<c015f694>][<800437a4>][<80279960>][<8005fa64>][<800ba4d4>][<8005e8d4>][<8005cd38>][<8005d578>][<800b6d24>][<800b6d1c>][<802276d8>][<8022656c>][<8006704c>][<80067044>][<800691d4>][<80072f64>][<80072e54>][<80069290>][<80073a00>][<8005e5f0>][<80046aa0>][<8005e5f0>][<8005cd04>][<8005e8d4>][<8005e0e4>][<8005cd38>][<8005d578>][<8007d0b0>][<8022656c>][<c00c8650>][<8007d108>][<8007d0e8>][<80045698>][<80045688>] Code: 3c020010 34420100 8e110008 <ac830000> ae020000 3c020020 34420200 ac640004 16200011 >>???; c00c8aa4 <END_OF_CODE+3fe1d474/????> <===== Trace; 801ca5e8 <ip_local_deliver_finish+0/2c0> Trace; 8007d41c <autoremove_wake_function+0/44> Trace; 801c018c <udp_packet+f0/114> Trace; 801ba0d0 <nf_conntrack_find_get+c8/dc> Trace; 801bbf74 <nf_conntrack_in+4ac/6f8> Trace; 8020f0d4 <ipt_do_table+50c/588> Trace; 8020f95c <nf_nat_fn+20c/244> Trace; 8020f95c <nf_nat_fn+20c/244> Trace; c015b2f4 <END_OF_CODE+3feafcc4/????> Trace; c0161e80 <END_OF_CODE+3feb6850/????> Trace; 8008bfe0 <handle_IRQ_event+64/d4> Trace; 8008dfac <handle_level_irq+c0/114> Trace; 801ca110 <ip_rcv_finish+0/4d8> Trace; 800431e8 <ar5315_irq_dispatch+26c/2a4> Trace; 800437a4 <ret_from_irq+0/4> Trace; c0106840 <END_OF_CODE+3fe5b210/????> Trace; 80050000 <blast_icache64_page_indexed+0/e4> Trace; c01549f0 <END_OF_CODE+3fea93c0/????> Trace; c015f7b0 <END_OF_CODE+3feb4180/????> Trace; c015f7f0 <END_OF_CODE+3feb41c0/????> Trace; c0161e80 <END_OF_CODE+3feb6850/????> Trace; 80079f5c <rcu_process_callbacks+1c/38> Trace; 8006f8d0 <run_timer_softirq+20/1fc> Trace; 8008bfe0 <handle_IRQ_event+64/d4> Trace; 8006b778 <tasklet_action+118/198> Trace; 8006b1e0 <__do_softirq+78/100> Trace; 8006b2c4 <do_softirq+5c/94> Trace; c015f694 <END_OF_CODE+3feb4064/????> Trace; 800437a4 <ret_from_irq+0/4> Trace; 80279960 <cpu_probe+584/994> Trace; 8005fa64 <__wake_up_sync+3c/74> Trace; 800ba4d4 <__fput+188/1cc> Trace; 8005e8d4 <dequeue_entity+98/d8> Trace; 8005cd38 <dequeue_task+1c/30> Trace; 8005d578 <pick_next_task_fair+38/78> Trace; 800b6d24 <filp_close+74/90> Trace; 800b6d1c <filp_close+6c/90> Trace; 802276d8 <cond_resched+44/5c> Trace; 8022656c <schedule+1e0/7d4> Trace; 8006704c <put_files_struct+188/208> Trace; 80067044 <put_files_struct+180/208> Trace; 800691d4 <do_exit+960/96c> Trace; 80072f64 <dequeue_signal+13c/17c> Trace; 80072e54 <dequeue_signal+2c/17c> Trace; 80069290 <sys_exit_group+0/c> Trace; 80073a00 <get_signal_to_deliver+444/498> Trace; 8005e5f0 <enqueue_entity+2fc/33c> Trace; 80046aa0 <do_notify_resume+64/3ec> Trace; 8005e5f0 <enqueue_entity+2fc/33c> Trace; 8005cd04 <enqueue_task+1c/34> Trace; 8005e8d4 <dequeue_entity+98/d8> Trace; 8005e0e4 <try_to_wake_up+84/d8> Trace; 8005cd38 <dequeue_task+1c/30> Trace; 8005d578 <pick_next_task_fair+38/78> Trace; 8007d0b0 <kthread+0/b0> Trace; 8022656c <schedule+1e0/7d4> Trace; c00c8650 <END_OF_CODE+3fe1d020/????> Trace; 8007d108 <kthread+58/b0> Trace; 8007d0e8 <kthread+38/b0> Trace; 80045698 <kernel_thread_helper+10/18> Trace; 80045688 <kernel_thread_helper+0/18> [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-08 21:18 ` Sven Eckelmann 2008-09-08 21:45 ` Sven Eckelmann @ 2008-09-09 11:26 ` Simon Wunderlich 2008-09-09 22:45 ` Sven Eckelmann 1 sibling, 1 reply; 9+ messages in thread From: Simon Wunderlich @ 2008-09-09 11:26 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 2826 bytes --] Hey Sven, thanks for you analysis!! On Mon, Sep 08, 2008 at 11:18:42PM +0200, Sven Eckelmann wrote: > Ok, I got the /proc/modules file now. Current situation is following: it > crashes inside the the batman module add position 0x00000aa4 > > a60: 3c020000 lui v0,0x0 > a64: 8c500024 lw s0,36(v0) > a68: 24420024 addiu v0,v0,36 > a6c: 12020014 beq s0,v0,ac0 <cleanup_module+0x610> > a70: 3c040000 lui a0,0x0 > a74: 3c050000 lui a1,0x0 > a78: 3c020000 lui v0,0x0 > a7c: 24840000 addiu a0,a0,0 > a80: 24a50088 addiu a1,a1,136 > a84: 24420000 addiu v0,v0,0 > a88: 0040f809 jalr v0 > a8c: 24060283 li a2,643 > a90: 8e040004 lw a0,4(s0) > a94: 8e030000 lw v1,0(s0) > a98: 3c020010 lui v0,0x10 > a9c: 34420100 ori v0,v0,0x100 > aa0: 8e110008 lw s1,8(s0) > aa4: ac830000 sw v1,0(a0) > aa8: ae020000 sw v0,0(s0) > aac: 3c020020 lui v0,0x20 > ab0: 34420200 ori v0,v0,0x200 > ab4: ac640004 sw a0,4(v1) > > This is part of the compiled version of packet_recv_thread. Due the > optimizations done I cannot say were exactly the problem lies. > > I think the code of get_ip_addr() got inlined in packet_recv_thread and we > need to search for the crash inside of it at list_del(&entry->list); > I would also say that the really crash is inside __list_del where prev and > next will be set. To check it, look at LIST_POISON1 and LIST_POISON1 inside of > poison.h of the current linux kernel. You will notice that the values are > 0x00100100 and 0x00200200 == address of the failed paging request. The list > poison stuff will be done in in list_del after calling __list_del (it is the > sequence lui, ori, sw in the asm snipped). So could it be that we have a > poisened entry inside the list? > This could for example happen when we get scheduled (please notice that the > optimizer exchanged many instrictions) while another part of the program is > deleting entries. I haven't checked the rest of the code if that really could > happen, but that is my current idea. Mhm, as far as i looked into the issue, there are the following points where free_client_list is accessed: init_module() - INIT_LIST_HEAD() * called on startup get_ip_addr() - list_del(): * "secured" with a hash_lock spinlock cleanup_module() - list_del(): * only called when unloading the module batgat_ioctl() - list_del() * from IOCREMDEV. This is called when batman shuts down. packet_recv_thread - list_add(): * also secured in a hash_lock spinlock. So it seems there should be no concurrency without user interaction (module or batman shutdown). But i don't have a good idea yet where the problem comes from ... :/ best regards, Simon [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-09 11:26 ` Simon Wunderlich @ 2008-09-09 22:45 ` Sven Eckelmann 2008-09-10 9:50 ` Sven Eckelmann 0 siblings, 1 reply; 9+ messages in thread From: Sven Eckelmann @ 2008-09-09 22:45 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 4523 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 09 September 2008 13:26:47 Simon Wunderlich wrote: > Hey Sven, > > thanks for you analysis!! > > [...] > > Mhm, as far as i looked into the issue, there are the following > points where free_client_list is accessed: > [...] > So it seems there should be no concurrency without user interaction > (module or batman shutdown). > But i don't have a good idea yet where the problem comes from ... :/ Yes, the idea of the race condition was stupid. So what is the real problem? Maybe a compiler bug? Let's check the assembler stuff: The stuff around list_del in get_ip_addr: a90: lw a0,4(s0) /* a0 gets our prev pointer */ a94: lw v1,0(s0) /* v1 gets our next pointer */ a98: lui v0,0x10 /* load 0x100100 in v0 */ a9c: ori v0,v0,0x100 aa0: lw s1,8(s0) /* load pointer to gw_client in s1 */ aa4: sw v1,0(a0) /* store our next pointer in the next pointer of prev ****crash**** because 0x200200 or 0x0 was in our next pointer - why do we have a poisened next pointer when we are probably the first entry of the list -> is the list not correctly initialised or aren't we added correctly? */ aa8: sw v0,0(s0) /* store poison in next pointer */ aac: lui v0,0x20 /* load 0x200200 in v0 */ ab0: ori v0,v0,0x200 ab4: sw a0,4(v1) /* store our prev pointer in in the prev pointer of next */ The initialisation of the list is done in init_module. prev and next should be set to the list address. So let's search for it: /* "zero" means here the position were our module was loaded to. */ 1374: lui a1,0x0 /* set a1 to "zero" */ 1378: addiu v1,a1,36 /* 36 is the position of the structure free_client_list, so we set v1 to it */ 137c: lui a0,0x0 /* set a0 to "zero" */ 1380: sw v0,28(a0) /* store pointer to wp_hash */ 1384: sw v1,36(a1) /* store free_client_list.next as free_client_list */ 1388: sw v1,4(v1) /* store free_client_list.prev as free_client_list */ So this looks good too. So when we have a entry, we must have added it somewhere. Lets take a look at packet_recv_thread again where the list_add is /* v0 and v1 holds pointer to new allocated struct */ /* a3 holds "zero" - like t0 */ 9a8: sw s1,8(v0) /* store pointer to client_data in new data buffer */ 9ac: lw v0,36(a3) /* v0 gets free_client_list.next -> lets call it next_element */ /* shouldn't be another instruction between load and usage of the register? - like a nop */ 9b0: sw v0,0(v1) /* store next_element in tmp_entry.list.next */ 9b4: sw v1,4(v0) /* store pointer to tmp_entry next_element.prev */ 9b8: sw v1,36(a3) /* store pointer to tmp_entry in freeclient_list.next */ 9bc: j 9d4 <cleanup_module+0x524> 9c0: sw t0,4(v1) /* saves freeclient_list in tmp_entry.prev << if this would not be executed in parallel, we would get wrong data here, but because we are using mips it must be executed */ So I cannot see anything special - only these two instructions. Maybe we should create a version with debug output after list_add with next and prev pointer of tmp_entry and free_client_list. The same with entry before calling list_del in get_ip_addr. This should be compiled for the nightwing and send to Outback Dingo so he can test it and send the kernel log after a crash to us. The problem is that I added some printks, checked the resulting output and noticed that this changed the output (so the interesting parts aren't there anymore). So if it will run without problems and prints some "use free client from list" in the kernel log then we should have fixed it by resorting some instructions. If not... we should try to find it by using the extra debugging output. If it runs without problems, please remove the printks around the list_add and if it still runs and prints some "use free client from list" now... do it the other way around (keep the printks around list_add and remove it before list_del). ...just my ideas to find the problem. Best regards Sven Eckelmann -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkjG/HsACgkQqQGwKVlMoDv6fACg90wX35fyHR13/Dh/nBvrKM4C euwAn03zpb+HqWccdjcf7Z7SotWd+1s0 =pwTz -----END PGP SIGNATURE----- [-- Attachment #2: batgat_test.patch --] [-- Type: text/x-patch, Size: 1204 bytes --] --- a/batman/linux/modules/gateway.c +++ b/batman/linux/modules/gateway.c @@ -385,7 +385,11 @@ static int packet_recv_thread(void *data) tmp_entry = kmalloc(sizeof(struct free_client_data), GFP_KERNEL); if(tmp_entry != NULL) { tmp_entry->gw_client = client_data; + printk("list_add_b; tmp_entry pointers (%p, %p)\n", tmp_entry->list.prev, tmp_entry->list.next); + printk("list_add_b; free_client_list pointers (%p, %p)\n", free_client_list.prev, free_client_list.next); list_add(&tmp_entry->list,&free_client_list); + printk("list_add_a; tmp_entry pointers (%p, %p)\n", tmp_entry->list.prev, tmp_entry->list.next); + printk("list_add_a; free_client_list pointers (%p, %p)\n", free_client_list.prev, free_client_list.next); } else DBG("can't add free gw_client to free list"); @@ -642,6 +646,7 @@ static struct gw_client *get_ip_addr(struct sockaddr_in *client_addr) list_for_each_entry_safe(entry, next, &free_client_list, list) { DBG("use free client from list"); gw_client = entry->gw_client; + printk("free client; entry pointers (%p, %p)\n", entry->list.prev, entry->list.next); list_del(&entry->list); break; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-09 22:45 ` Sven Eckelmann @ 2008-09-10 9:50 ` Sven Eckelmann 0 siblings, 0 replies; 9+ messages in thread From: Sven Eckelmann @ 2008-09-10 9:50 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 1289 bytes --] On Wednesday 10 September 2008 00:45:07 Sven Eckelmann wrote: > [...] > /* v0 and v1 holds pointer to new allocated struct */ > /* a3 holds "zero" - like t0 */ > 9a8: sw s1,8(v0) /* store pointer to client_data in new data buffer */ > 9ac: lw v0,36(a3) /* v0 gets free_client_list.next -> lets call it > next_element */ > /* shouldn't be another instruction between load and usage of the > register? - like a nop */ > 9b0: sw v0,0(v1) /* store next_element in tmp_entry.list.next */ > 9b4: sw v1,4(v0) /* store pointer to tmp_entry next_element.prev */ > 9b8: sw v1,36(a3) /* store pointer to tmp_entry in freeclient_list.next > */ > 9bc: j 9d4 <cleanup_module+0x524> > 9c0: sw t0,4(v1) /* saves freeclient_list in tmp_entry.prev << if this > would not be executed in parallel, we would get > wrong data here, but because we are using mips it > must be executed */ > [...] Ok, searched inside the official mips32 instruction set reference and both questionable instructions are defined in a hard way without any unpredictable or undefined remarks. So they should be working fine. Best regards Sven Eckelmann [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [B.A.T.M.A.N.] Batman gateway lock ups 2008-09-05 15:01 [B.A.T.M.A.N.] Batman gateway lock ups Outback Dingo 2008-09-08 7:08 ` Sven Eckelmann 2008-09-08 21:18 ` Sven Eckelmann @ 2009-07-21 20:48 ` Simon Wunderlich 2 siblings, 0 replies; 9+ messages in thread From: Simon Wunderlich @ 2009-07-21 20:48 UTC (permalink / raw) To: The list for a Better Approach To Mobile Ad-hoc Networking [-- Attachment #1: Type: text/plain, Size: 3067 bytes --] Hello Dingo, i know it is a long time ago, but can you please verify if the problem present in Ticket 121 [1] still affects you? Or is it already solved? The entries are 10 months old and the issue is probably already fixed. Thank you very much, Simon [1] https://www.open-mesh.net/ticket/121 On Fri, Sep 05, 2008 at 10:01:24PM +0700, Outback Dingo wrote: > see pastebin > > http://www.pastebin.ca/1194874 > > pertinent info > dmesg | grep 'batgat loaded' > batgat: [init_module:96] batgat loaded rv1025 > uname -a > Linux nightwing 2.6.23.16 #16 Tue Apr 22 20:00:17 ART 2008 mips unknown > root@nightwing:~# batmand -v > WARNING: You are using the unstable batman branch. If you are interested in > *using* batman get the latest stable release ! > B.A.T.M.A.N. 0.3-beta (compatibility version 5) > lsmod > > Module Size Used by Tainted: > P > sch_htb 14048 > 2 > ath_ahb 103616 > 0 > wlan_xauth 480 > 0 > wlan_wep 4000 > 0 > wlan_tkip 9856 > 0 > wlan_ccmp 5440 > 2 > wlan_acl 1920 0 > ath_rate_minstrel 8352 1 > ath_hal 136832 3 ath_ahb,ath_rate_minstrel > wlan_scan_sta 8768 1 > wlan_scan_ap 6656 0 > wlan 152464 10 > ath_ahb,wlan_xauth,wlan_wep,wlan_tkip,wlan_ccmp,wlan_acl,ath_rate_minstrel,wlan_scan_sta,wlan_scan_ap > batgat 10944 1 > ipt_iprange 672 0 > ipt_TOS 832 0 > ipt_TTL 928 0 > xt_MARK 960 3 > ipt_ECN 1472 0 > xt_CLASSIFY 640 0 > ipt_ttl 704 0 > ipt_tos 544 0 > ipt_time 1568 0 > xt_tcpmss 1088 0 > xt_statistic 832 0 > xt_mark 672 7 > xt_mac 736 3 > xt_length 736 0 > ipt_ecn 1024 0 > xt_DSCP 1056 0 > xt_dscp 832 0 > imq 2096 0 > ipt_IMQ 672 2 > xt_string 896 0 > xt_layer7 9840 0 > ipt_ipp2p 6784 0 > ipt_LOG 4640 0 > xt_CHAOS 1792 0 > xt_DELUDE 2624 1 > xt_TARPIT 2816 1 > xt_quota 800 0 > xt_portscan 2016 0 > xt_pkttype 704 0 > xt_physdev 1488 0 > ipt_owner 800 0 > iptable_raw 832 0 > xt_NOTRACK 832 0 > xt_CONNMARK 1088 0 > ipt_recent 4992 0 > xt_helper 992 0 > xt_conntrack 1312 0 > xt_connmark 832 0 > xt_connbytes 1312 0 > tun 6592 0 > _______________________________________________ > B.A.T.M.A.N mailing list > B.A.T.M.A.N@open-mesh.net > https://list.open-mesh.net/mm/listinfo/b.a.t.m.a.n [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-07-21 20:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-05 15:01 [B.A.T.M.A.N.] Batman gateway lock ups Outback Dingo 2008-09-08 7:08 ` Sven Eckelmann 2008-09-08 21:18 ` Sven Eckelmann 2008-09-08 21:45 ` Sven Eckelmann 2008-09-09 8:03 ` Sven Eckelmann 2008-09-09 11:26 ` Simon Wunderlich 2008-09-09 22:45 ` Sven Eckelmann 2008-09-10 9:50 ` Sven Eckelmann 2009-07-21 20:48 ` Simon Wunderlich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox