From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Pedersen Date: Mon, 12 Mar 2012 19:13:44 -0700 Subject: [ath9k-devel] BUG: transmit buffer overflow? Message-ID: <20120313021344.GA2190@shredder> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org Hi ath9k-devel, We received the following bug report: Forwarded conversation Subject: transmit buffer overflow ? ------------------------ From: *David Fulgham* Date: Sun, Mar 4, 2012 at 8:19 PM To: devel I've been experiencing an issue with our mesh network and mesh nodes repeating constantly. After some troubleshooting it looks like I may have come across a transmit buffer overrun issue or the like. I've been able to reproduce it quite readily by simply pushing as many packets through the mesh as possible between two nodes that have a low signal level connection (i.e. < 75dbm) and thus have a situation where there is a need to re-transmit a larger number of packets. PC -> MeshNode1 ~~ <75dbm link ~~ Meshnode2 I then ping mesh node 2 from the PC with a command like ping -i .01 and after a few minutes (and I assume the transmit buffer fills up and overflows) one or both of the radios will reboot. I can make the reboot happen pretty much anytime I want by moving the radios to almost out of range of each other. I'm using the openwrt trunk snapshot from Feb26th on UBNT Rocket 5M devices, and thus don't have any debugging turned on. I have determined that the radio reboot that I am experiencing only happens if the mesh interface is bridged with another (i.e. a mesh portal, mesh/ap bridge or mesh/eth bridge). If the mesh interface is brought up and not bridged the router doesn't seem to have the reboot issue. All of my mesh nodes require a bridge to either a Access Point or to a ethernet network. [describing a bridged scenario below..] As you can see in the video I am able to ping both routers (left on top, right on bottom) which is working well, however when I put my hand over the antenna ports on the radio on the left to reduce the signal level enough to cause the radios to communicate poorly and thus cause transmit retries, the radio on the right reboots followed closely by the radio on the left. Both radios reboot. This is happening in the field constantly on all mesh radios in my 5.8Ghz mesh. root at OpenWrt:/# cat /sys/kernel/debug/crashlog Time: 1331498343.850201 Modules: ohci_hcd at 83120000+3de0 nf_nat_irc at 830f7000+360 nf_conntrack_irc at 830db000+9f0 nf_nat_ftp at 830ee000+420 nf_conntrack_ftp at 830f4000+1190 ipt_MASQUERADE at 830c5000+420 iptable_nat at 830d5000+8d5 nf_nat at 830dc000+2917 xt_conntrack at 830ed000+850 xt_CT at 830ea000+510 xt_NOTRACK at 830b1000+210 iptable_raw at 83b08000+280 xt_state at 83b0f000+2b0 nf_conntrack_ipv4 at 830b5000+f94 nf_defrag_ipv4 at 83bfb000+2e6 nf_conntrack at 830e0000+99e2 ehci_hcd at 830c8000+7f70 pppoe at 830bc000+1d80 pppox at 83b0b000+55a ipt_REJECT at 830b4000+760 xt_TCPMSS at 83a09000+770 ipt_LOG at 830b2000+17f0 xt_comment at 83b09000+1e0 xt_multiport at 83bfa000+4a0 xt_mac at 830af000+260 xt_limit at 83aed000+400 iptable_mangle at 83b0c000+390 iptable_filter at 83a26000+2c0 ip_tables at 83afc000+237d xt_tcpudp at 83ae5000+6b0 x_tables at 83a50000+2ad5 ppp_async at 83aba000+1790 ppp_generic at 830a8000+4a42 slhc at 839aa000+11eb ath9k at 83b20000+1395d ath9k_common at 839be000+5e9 ath9k_hw at 83b80000+4f29b ath at 839b8000+3939 mac80211 at 83b40000+3cac4 usbcore at 83aa0000+19564 usb_common at 83ae4000+232 nls_base at 839a8000+130e crc_ccitt at 839cd000+3fb cfg80211 at 83ac0000+22fb2 compat at 83a34000+2f0 arc4 at 83a4b000+350 aes_generic at 83a38000+7509 crypto_algapi at 83a10000+28a1 leds_gpio at 839da000+660 gpio_button_hotplug at 8398b000+cc0 <5>[ 0.000000] Linux version 3.2.9 (user at leesee-ThinkPad-R61e) (gcc version 4.6.3 20120201 (prerelease) (Linaro GCC 4.6-2012.02) ) #5 Sun Mar 11 16:20:22 EDT 2012 <7>[ 0.000000] MyLoader: sysp=00000000, boardp=00000000, parts=ffffffff<7>[ 0.000000] free_area_init_node: node 0, pgdat 802edcf0, node_mem_map 81000000<6>[ 0.000000] Memory: 61636k/65536k available (2063k kernel code, 3900k reserved, 553k data, 192k init, 0k highmem) <5>[ 7.030000] JFFS2 notice: (1) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found. <6>[ 7.050000] VFS: Mounted root (jffs2 filesystem) readonly on device 31:3. <6>[ 7.050000] Freeing unused kernel memory: 192k freed <7>[ 13.150000] Registered led device: ubnt:red:link1 <7>[ 13.150000] Registered led device: ubnt:orange:link2 <7>[ 13.150000] Registered led device: ubnt:green:link3 <7>[ 13.150000] Registered led device: ubnt:green:link4 <6>[ 16.410000] eth0: link down <6>[ 17.010000] Compat-wireless backport release: compat-wireless-2012-02-27-1-r30877 <6>[ 17.010000] Backport based on wireless-testing.git master-2012-02-27 <6>[ 17.340000] cfg80211: Calling CRDA to update world regulatory domain <6>[ 17.720000] usbcore: registered new interface driver usbfs <6>[ 17.730000] usbcore: registered new interface driver hub <6>[ 17.740000] usbcore: registered new device driver usb <6>[ 18.500000] cfg80211: World regulatory domain updated: <6>[ 18.500000] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) <6>[ 18.510000] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <6>[ 18.520000] cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) <6>[ 18.530000] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) <6>[ 18.540000] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <6>[ 18.540000] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <4>[ 19.280000] PCI: Enabling device 0000:00:00.0 (0000 -> 0002) <7>[ 19.290000] PCI: Setting latency timer of device 0000:00:00.0 to 64 <7>[ 19.290000] ath: EEPROM regdomain: 0x0 <7>[ 19.290000] ath: EEPROM indicates default country code should be used <7>[ 19.290000] ath: doing EEPROM country->regdmn map search <7>[ 19.290000] ath: country maps to regdmn code: 0x3a <7>[ 19.290000] ath: Country alpha2 being used: US <7>[ 19.290000] ath: Regpair used: 0x3a <7>[ 19.300000] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' <7>[ 19.300000] Registered led device: ath9k-phy0 <6>[ 19.300000] ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xb0000000, irq=40 <6>[ 19.310000] cfg80211: Calling CRDA for country: US <6>[ 19.370000] cfg80211: Regulatory domain changed to country: US <6>[ 19.380000] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) <6>[ 19.380000] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm) <6>[ 19.390000] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm) <6>[ 19.400000] cfg80211: (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <6>[ 19.410000] cfg80211: (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <6>[ 19.420000] cfg80211: (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) <6>[ 19.420000] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm) <6>[ 19.510000] PPP generic driver version 2.4.2 <6>[ 19.630000] ip_tables: (C) 2000-2006 Netfilter Core Team <6>[ 19.800000] NET: Registered protocol family 24 <6>[ 19.930000] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver <6>[ 20.050000] nf_conntrack version 0.5.0 (966 buckets, 3864 max) <6>[ 20.380000] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver <6>[ 20.390000] ath79-ohci ath79-ohci: Atheros built-in OHCI controller <6>[ 20.390000] ath79-ohci ath79-ohci: new USB bus registered, assigned bus number 1 <6>[ 20.400000] ath79-ohci ath79-ohci: irq 14, io mem 0x1b000000 <6>[ 20.470000] hub 1-0:1.0: USB hub found <6>[ 20.470000] hub 1-0:1.0: 1 port detected <6>[ 22.210000] device eth0 entered promiscuous mode <6>[ 22.750000] eth0: link up (100Mbps/Full duplex) <6>[ 22.750000] br-lan: port 1(eth0) entering forwarding state <6>[ 22.760000] br-lan: port 1(eth0) entering forwarding state <6>[ 23.620000] device wlan0 entered promiscuous mode <6>[ 23.630000] br-lan: port 2(wlan0) entering forwarding state <6>[ 23.630000] br-lan: port 2(wlan0) entering forwarding state <6>[ 24.860000] device eth0 left promiscuous mode <6>[ 24.860000] br-lan: port 1(eth0) entering forwarding state <6>[ 25.280000] device eth0 entered promiscuous mode <6>[ 25.280000] br-lan: port 1(eth0) entering forwarding state <6>[ 25.290000] br-lan: port 1(eth0) entering forwarding state <1>[ 106.180000] CPU 0 Unable to handle kernel paging request at virtual address 00000004, epc == 83b29074, ra == 83b28fc8 <4>[ 106.190000] Oops[#1]: <4>[ 106.190000] Cpu 0 <4>[ 106.190000] $ 0 : 00000000 80320000 00000000 00000000 <4>[ 106.190000] $ 4 : 00000000 83a74000 00000000 0000002f <4>[ 106.190000] $ 8 : 00000018 80163458 831736c0 00000001 <4>[ 106.190000] $12 : 00000012 0012002b 00000001 00000050 <4>[ 106.190000] $20 : 83b39a70 00000000 83a05b70 0000002f <4>[ 106.190000] $24 : 00000003 83b2151c <4>[ 106.190000] $28 : 802d4000 802d5b78 0000002f 83b28fc8 <4>[ 106.190000] Hi : 00000005 <4>[ 106.190000] Lo : 00000000 <4>[ 106.190000] epc : 83b29074 ath_txchainmask_reduction+0x154/0x1170 [ath9k] <4>[ 106.190000] Tainted: G O <4>[ 106.190000] ra : 83b28fc8 ath_txchainmask_reduction+0xa8/0x1170 [ath9k] <4>[ 106.190000] Status: 1000d403 KERNEL EXL IE <4>[ 106.190000] Cause : 00800008 <4>[ 106.190000] BadVA : 00000004 <4>[ 106.190000] PrId : 00019374 (MIPS 24Kc) <4>[ 106.190000] Modules linked in: ohci_hcd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_conntrack xt_CT xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_hcd pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc ath9k(O) ath9k_common(O) ath9k_hw(O) ath(O) mac80211(O) usbcore usb_common nls_base crc_ccitt cfg80211(O) compat(O) arc4 aes_generic crypto_algapi leds_gpio gpio_button_hotplug(O) <4>[ 106.190000] Process swapper (pid: 0, threadinfo=802d4000, task=802d7c80, tls=00000000) <4>[ 106.190000] Stack : 00000027 228deb25 00215c15 f6c1c874 88030000 00000000 00010100 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] ... <4>[ 106.190000] Call Trace: <4>[ 106.190000] [<83b29074>] ath_txchainmask_reduction+0x154/0x1170 [ath9k] <4>[ 106.190000] [<83b28fc8>] ath_txchainmask_reduction+0xa8/0x1170 [ath9k] <4>[ 106.190000] <4>[ 106.190000] <4>[ 106.190000] Code: 00831821 70522002 8c630000 <8c630004> 00839021 8e2308ec 92420007 30630020 10600004 <4>[ 106.400000] ---[ end trace 574d10138db4623f ]--- =================================== Time: 1331498343.857098 on+0x154/0x1170 [ath9k] <4>[ 106.190000] Tainted: G O <4>[ 106.190000] ra : 83b28fc8 ath_txchainmask_reduction+0xa8/0x1170 [ath9k] <4>[ 106.190000] Status: 1000d403 KERNEL EXL IE <4>[ 106.190000] Cause : 00800008 <4>[ 106.190000] BadVA : 00000004 <4>[ 106.190000] PrId : 00019374 (MIPS 24Kc) <4>[ 106.190000] Modules linked in: ohci_hcd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_conntrack xt_CT xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_hcd pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc ath9k(O) ath9k_common(O) ath9k_hw(O) ath(O) mac80211(O) usbcore usb_common nls_base crc_ccitt cfg80211(O) compat(O) arc4 aes_generic crypto_algapi leds_gpio gpio_button_hotplug(O) <4>[ 106.190000] Process swapper (pid: 0, threadinfo=802d4000, task=802d7c80, tls=00000000) <4>[ 106.190000] Stack : 00000027 228deb25 00215c15 f6c1c874 88030000 00000000 00010100 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <4>[ 106.190000] ... <4>[ 106.190000] Call Trace: <4>[ 106.190000] [<83b29074>] ath_txchainmask_reduction+0x154/0x1170 [ath9k] <4>[ 106.190000] [<83b28fc8>] ath_txchainmask_reduction+0xa8/0x1170 [ath9k] <4>[ 106.190000] <4>[ 106.190000] <4>[ 106.190000] Code: 00831821 70522002 8c630000 <8c630004> 00839021 8e2308ec 92420007 30630020 10600004 <4>[ 106.400000] ---[ end trace 574d10138db4623f ]--- <0>[ 106.410000] Kernel panic - not syncing: Fatal exception in interrupt David. ----- End forwarded message -----