public inbox for b.a.t.m.a.n@lists.open-mesh.org
 help / color / mirror / Atom feed
From: Philipp Psurek <philipp.psurek@gmail.com>
To: b.a.t.m.a.n@lists.open-mesh.org
Subject: Re: [B.A.T.M.A.N.] kernel BUG at net/core/skbuff.c:100
Date: Thu, 20 Nov 2014 13:22:29 +0100	[thread overview]
Message-ID: <1416486149.2747.9.camel@gmail.com> (raw)
In-Reply-To: <546DC214.6050908@hundeboll.net>

[-- Attachment #1: Type: text/plain, Size: 5736 bytes --]

Hi Martin

/usr/src/linux/net/batman-adv/fragmentation.c is patched. I'm sorry I
oversaw your attachment. the new module is running, the size differs

# lsmod
[ … ]
batman_adv            147774  0 # old
batman_adv            148030  0 # new
[ … ]


Batman-adv runs with

# batctl if
fastd0: active

# batctl it
5000

# batctl ap
disabled

# batctl bl
enabled
# batctl dat
enabled

# batctl ag
enabled

# batctl b
disabled

# batctl f
enabled

# batctl nc
enabled

# batctl mark
0x00000000/0x00000000

# batctl mm
enabled

batctl ll
Error - can't open file '/sys/class/net/bat0/mesh/log_level': No such
file or directory [ … ]

batctl gw
server (announced bw: 100.0/100.0 MBit)

this are also the options while kernel panic.

Am Donnerstag, den 20.11.2014, 11:27 +0100 schrieb Martin Hundebøll:
> On 2014-11-20 10:48, Philipp Psurek wrote:

[ … ]

> Yeah, most people compile out network coding. Has the bug disappeared 
> after disabling NC ?

I can't tell for sure. nc is disabled for 20 hours. The Bug appeared
from 1 minute to 72 hours. It depends on our users. To reproduce the bug
nc is enabled again.

> > Am Donnerstag, den 20.11.2014, 09:32 +0100 schrieb Martin Hundebøll:
> >> Thanks for you report. The bug is probably triggered by some bogus data
> >> in an incoming packet. I have created a small debug patch that will
> >> detect if this is the case, and print some debug info if so.
> >
> > Thank you for your work. I didn't find your Patch on
> > http://git.open-mesh.org/batman-adv.git
> 
> It was attached to my previous mail :)

I'm so sorry ;-) my fault

> > I can not analyse the packages because the gateway is part of an ISP
> > infrastructure and there is data privacy. But if you're capable to fish
> > only the bogus data package during kernel panic with your patch there
> > shouldn't be any problems, I think.
> 
> My debug patch should only print the header of the packet causing the 
> panic, so no problems with privacy here. (But you should probably check 
> the output before mailing it to a public list...)

OK, thanks for that

[ … ]

> I am running with NC on my machines in the lab and haven't seen this 
> frag-issue before. I have seen a similar issue (wrong size value in the 
> header) in another context though, but this wasn't due to either network 
> coding or fragmentation.

Well, the lab is peaceful but in the free wild there are evil data
packages.

> Would you mind sending me your fastd config (without the key), so that I 
> can try to reproduce this in my VMs?

Not at all. Here is the censored /etc/fastd/fastd.conf

#---8<---8<---8<---8<---8<---8<----
bind <my_publicIP>:<my_fastdPORT>;
include "secret.conf";
include peers from "peers/wupper";
include peers from "testpeers/wupper";
include peers from "servers/wupper";
interface "fastd0";
log level warn;
method "salsa2012+gmac";
#### doesn't have anything to do with the bug, also seen with fastd v14
#### not used yet but with the new firmware:
method "salsa2012+umac";
mtu 1426;

on up "
 ip link set address <MAC_ADDRESS> dev $INTERFACE
 ip link set up dev $INTERFACE
 modprobe batman-adv
 batctl if add fastd0
 batctl it 5000
 batctl bl enable
 batctl gw client
 ### gw will be changed later to server 100000/100000
 ip link set up dev bat0
 ip addr add 10.3.<IP>/16 broadcast 10.3.255.255 dev bat0
 ip addr add 10.3.<anotherIP>/16 broadcast 10.3.255.255 dev bat0
 ip addr add fda0:747e:ab29:e1ba:<IPv6_IP>/64 dev bat0
 ip route add 10.3.0.0/16 dev bat0 proto kernel scope link src
10.3.<wrong_IP*)>
 alfred -i bat0 -m > /dev/null 2>&1 &
 batadv-vis -i bat0 -s > /dev/null 2>&1 &
";
#---8<---8<---8<---8<---8<---8<----EOF

*) now I see there is a different IP. This IP does not belong to this
machine, and during kernel panic and now to no machine in the Batman
cloud.

wolke linux # /etc/init.d/fastd start fastd ...
RTNETLINK answers: Invalid argument
#### now I know why ;-) but to reproduce the bug I don't change it 

then this commands are executed:
#---8<---8<---8<---8<---8<---8<----
ip tunnel add tun-ffw-w07 mode ipip remote <remoteIP> local <myIP>
ip addr add <some_ISP_IP>/31 dev tun-ffw-w07
ip tunnel change tun-ffw-w07 ttl 64
ip link set mtu 1400 dev tun-ffw-w07
ip link set dev tun-ffw-w07 up

ip rule add from <some_ISP_IP>/31 table 16
ip rule add iif bat0 table 16
ip rule add from all to <some_ISP_IP_for_this_machine> lookup 16 

ip route add default via <some_ISP_IP_on_the_other_side> \
	dev tun-ffw-w07 table 16
ip route add <some_ISP_IP>/31 dev tun-ffw-w07 table 16

# bat doesn't need any address, but the error occurs also with scope
# link
ip addr flush dev fastd0

iptables -t nat \
	-A POSTROUTING \
	-o tun-ffw-w07 ! -s <some_ISP_IP>/31 \
	-j SNAT --to <some_ISP_IP_for_this_machine>
iptables -A FORWARD -p tcp \
	--tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
# yes, I know … but some services in the net do not like IMCP
# http://lartc.org/howto/lartc.cookbook.mtu-mss.html

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0

/etc/local.d/kdump.start
/etc/init.d/dhcpd restart
/etc/init.d/vnstatd restart
/etc/init.d/named restart
/etc/init.d/apache2 restart
batctl gw server 100000/100000
#---8<---8<---8<---8<---8<---8<----EOF

Now we have to wait till “prime time” or weekend. I always hoped:
“please don't crush” but now it's different ;-) I hope after that you
can reproduce the bug and fix it.

Best regards

Philipp
________________________
Freifunk Rheinland e. V.
– Funkzelle Wuppertal –

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2014-11-20 12:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-18 21:58 [B.A.T.M.A.N.] kernel BUG at net/core/skbuff.c:100 Philipp Psurek
2014-11-20  8:32 ` Martin Hundebøll
2014-11-20  9:48   ` Philipp Psurek
2014-11-20 10:27     ` Martin Hundebøll
2014-11-20 12:22       ` Philipp Psurek [this message]
2014-11-20 12:36         ` Martin Hundebøll
2014-11-21  8:40           ` Philipp Psurek
2014-11-22 20:39           ` Philipp Psurek
2014-11-24  8:24             ` Martin Hundebøll
2014-11-24 10:44               ` Philipp Psurek
2014-11-24 12:14                 ` Philipp Psurek
2014-11-24 21:15                   ` Philipp Psurek
2014-11-24 22:26                     ` Philipp Psurek
2014-11-25  0:22                       ` Philipp Psurek
2014-11-25 10:17                         ` Philipp Psurek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1416486149.2747.9.camel@gmail.com \
    --to=philipp.psurek@gmail.com \
    --cc=b.a.t.m.a.n@lists.open-mesh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox