Hi Martin /usr/src/linux/net/batman-adv/fragmentation.c is patched. I'm sorry I oversaw your attachment. the new module is running, the size differs # lsmod [ … ] batman_adv 147774 0 # old batman_adv 148030 0 # new [ … ] Batman-adv runs with # batctl if fastd0: active # batctl it 5000 # batctl ap disabled # batctl bl enabled # batctl dat enabled # batctl ag enabled # batctl b disabled # batctl f enabled # batctl nc enabled # batctl mark 0x00000000/0x00000000 # batctl mm enabled batctl ll Error - can't open file '/sys/class/net/bat0/mesh/log_level': No such file or directory [ … ] batctl gw server (announced bw: 100.0/100.0 MBit) this are also the options while kernel panic. Am Donnerstag, den 20.11.2014, 11:27 +0100 schrieb Martin Hundebøll: > On 2014-11-20 10:48, Philipp Psurek wrote: [ … ] > Yeah, most people compile out network coding. Has the bug disappeared > after disabling NC ? I can't tell for sure. nc is disabled for 20 hours. The Bug appeared from 1 minute to 72 hours. It depends on our users. To reproduce the bug nc is enabled again. > > Am Donnerstag, den 20.11.2014, 09:32 +0100 schrieb Martin Hundebøll: > >> Thanks for you report. The bug is probably triggered by some bogus data > >> in an incoming packet. I have created a small debug patch that will > >> detect if this is the case, and print some debug info if so. > > > > Thank you for your work. I didn't find your Patch on > > http://git.open-mesh.org/batman-adv.git > > It was attached to my previous mail :) I'm so sorry ;-) my fault > > I can not analyse the packages because the gateway is part of an ISP > > infrastructure and there is data privacy. But if you're capable to fish > > only the bogus data package during kernel panic with your patch there > > shouldn't be any problems, I think. > > My debug patch should only print the header of the packet causing the > panic, so no problems with privacy here. (But you should probably check > the output before mailing it to a public list...) OK, thanks for that [ … ] > I am running with NC on my machines in the lab and haven't seen this > frag-issue before. I have seen a similar issue (wrong size value in the > header) in another context though, but this wasn't due to either network > coding or fragmentation. Well, the lab is peaceful but in the free wild there are evil data packages. > Would you mind sending me your fastd config (without the key), so that I > can try to reproduce this in my VMs? Not at all. Here is the censored /etc/fastd/fastd.conf #---8<---8<---8<---8<---8<---8<---- bind :; include "secret.conf"; include peers from "peers/wupper"; include peers from "testpeers/wupper"; include peers from "servers/wupper"; interface "fastd0"; log level warn; method "salsa2012+gmac"; #### doesn't have anything to do with the bug, also seen with fastd v14 #### not used yet but with the new firmware: method "salsa2012+umac"; mtu 1426; on up " ip link set address dev $INTERFACE ip link set up dev $INTERFACE modprobe batman-adv batctl if add fastd0 batctl it 5000 batctl bl enable batctl gw client ### gw will be changed later to server 100000/100000 ip link set up dev bat0 ip addr add 10.3./16 broadcast 10.3.255.255 dev bat0 ip addr add 10.3./16 broadcast 10.3.255.255 dev bat0 ip addr add fda0:747e:ab29:e1ba:/64 dev bat0 ip route add 10.3.0.0/16 dev bat0 proto kernel scope link src 10.3. alfred -i bat0 -m > /dev/null 2>&1 & batadv-vis -i bat0 -s > /dev/null 2>&1 & "; #---8<---8<---8<---8<---8<---8<----EOF *) now I see there is a different IP. This IP does not belong to this machine, and during kernel panic and now to no machine in the Batman cloud. wolke linux # /etc/init.d/fastd start fastd ... RTNETLINK answers: Invalid argument #### now I know why ;-) but to reproduce the bug I don't change it then this commands are executed: #---8<---8<---8<---8<---8<---8<---- ip tunnel add tun-ffw-w07 mode ipip remote local ip addr add /31 dev tun-ffw-w07 ip tunnel change tun-ffw-w07 ttl 64 ip link set mtu 1400 dev tun-ffw-w07 ip link set dev tun-ffw-w07 up ip rule add from /31 table 16 ip rule add iif bat0 table 16 ip rule add from all to lookup 16 ip route add default via \ dev tun-ffw-w07 table 16 ip route add /31 dev tun-ffw-w07 table 16 # bat doesn't need any address, but the error occurs also with scope # link ip addr flush dev fastd0 iptables -t nat \ -A POSTROUTING \ -o tun-ffw-w07 ! -s /31 \ -j SNAT --to iptables -A FORWARD -p tcp \ --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu # yes, I know … but some services in the net do not like IMCP # http://lartc.org/howto/lartc.cookbook.mtu-mss.html sysctl -w net.ipv4.ip_forward=1 sysctl -w net.ipv4.conf.default.rp_filter=0 sysctl -w net.ipv4.conf.all.rp_filter=0 /etc/local.d/kdump.start /etc/init.d/dhcpd restart /etc/init.d/vnstatd restart /etc/init.d/named restart /etc/init.d/apache2 restart batctl gw server 100000/100000 #---8<---8<---8<---8<---8<---8<----EOF Now we have to wait till “prime time” or weekend. I always hoped: “please don't crush” but now it's different ;-) I hope after that you can reproduce the bug and fix it. Best regards Philipp ________________________ Freifunk Rheinland e. V. – Funkzelle Wuppertal –