All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Psurek <philipp.psurek@gmail.com>
To: b.a.t.m.a.n@lists.open-mesh.org
Subject: Re: [B.A.T.M.A.N.] kernel BUG at net/core/skbuff.c:100
Date: Thu, 20 Nov 2014 13:22:29 +0100	[thread overview]
Message-ID: <1416486149.2747.9.camel@gmail.com> (raw)
In-Reply-To: <546DC214.6050908@hundeboll.net>

[-- Attachment #1: Type: text/plain, Size: 5736 bytes --]

Hi Martin

/usr/src/linux/net/batman-adv/fragmentation.c is patched. I'm sorry I
oversaw your attachment. the new module is running, the size differs

# lsmod
[ … ]
batman_adv            147774  0 # old
batman_adv            148030  0 # new
[ … ]


Batman-adv runs with

# batctl if
fastd0: active

# batctl it
5000

# batctl ap
disabled

# batctl bl
enabled
# batctl dat
enabled

# batctl ag
enabled

# batctl b
disabled

# batctl f
enabled

# batctl nc
enabled

# batctl mark
0x00000000/0x00000000

# batctl mm
enabled

batctl ll
Error - can't open file '/sys/class/net/bat0/mesh/log_level': No such
file or directory [ … ]

batctl gw
server (announced bw: 100.0/100.0 MBit)

this are also the options while kernel panic.

Am Donnerstag, den 20.11.2014, 11:27 +0100 schrieb Martin Hundebøll:
> On 2014-11-20 10:48, Philipp Psurek wrote:

[ … ]

> Yeah, most people compile out network coding. Has the bug disappeared 
> after disabling NC ?

I can't tell for sure. nc is disabled for 20 hours. The Bug appeared
from 1 minute to 72 hours. It depends on our users. To reproduce the bug
nc is enabled again.

> > Am Donnerstag, den 20.11.2014, 09:32 +0100 schrieb Martin Hundebøll:
> >> Thanks for you report. The bug is probably triggered by some bogus data
> >> in an incoming packet. I have created a small debug patch that will
> >> detect if this is the case, and print some debug info if so.
> >
> > Thank you for your work. I didn't find your Patch on
> > http://git.open-mesh.org/batman-adv.git
> 
> It was attached to my previous mail :)

I'm so sorry ;-) my fault

> > I can not analyse the packages because the gateway is part of an ISP
> > infrastructure and there is data privacy. But if you're capable to fish
> > only the bogus data package during kernel panic with your patch there
> > shouldn't be any problems, I think.
> 
> My debug patch should only print the header of the packet causing the 
> panic, so no problems with privacy here. (But you should probably check 
> the output before mailing it to a public list...)

OK, thanks for that

[ … ]

> I am running with NC on my machines in the lab and haven't seen this 
> frag-issue before. I have seen a similar issue (wrong size value in the 
> header) in another context though, but this wasn't due to either network 
> coding or fragmentation.

Well, the lab is peaceful but in the free wild there are evil data
packages.

> Would you mind sending me your fastd config (without the key), so that I 
> can try to reproduce this in my VMs?

Not at all. Here is the censored /etc/fastd/fastd.conf

#---8<---8<---8<---8<---8<---8<----
bind <my_publicIP>:<my_fastdPORT>;
include "secret.conf";
include peers from "peers/wupper";
include peers from "testpeers/wupper";
include peers from "servers/wupper";
interface "fastd0";
log level warn;
method "salsa2012+gmac";
#### doesn't have anything to do with the bug, also seen with fastd v14
#### not used yet but with the new firmware:
method "salsa2012+umac";
mtu 1426;

on up "
 ip link set address <MAC_ADDRESS> dev $INTERFACE
 ip link set up dev $INTERFACE
 modprobe batman-adv
 batctl if add fastd0
 batctl it 5000
 batctl bl enable
 batctl gw client
 ### gw will be changed later to server 100000/100000
 ip link set up dev bat0
 ip addr add 10.3.<IP>/16 broadcast 10.3.255.255 dev bat0
 ip addr add 10.3.<anotherIP>/16 broadcast 10.3.255.255 dev bat0
 ip addr add fda0:747e:ab29:e1ba:<IPv6_IP>/64 dev bat0
 ip route add 10.3.0.0/16 dev bat0 proto kernel scope link src
10.3.<wrong_IP*)>
 alfred -i bat0 -m > /dev/null 2>&1 &
 batadv-vis -i bat0 -s > /dev/null 2>&1 &
";
#---8<---8<---8<---8<---8<---8<----EOF

*) now I see there is a different IP. This IP does not belong to this
machine, and during kernel panic and now to no machine in the Batman
cloud.

wolke linux # /etc/init.d/fastd start fastd ...
RTNETLINK answers: Invalid argument
#### now I know why ;-) but to reproduce the bug I don't change it 

then this commands are executed:
#---8<---8<---8<---8<---8<---8<----
ip tunnel add tun-ffw-w07 mode ipip remote <remoteIP> local <myIP>
ip addr add <some_ISP_IP>/31 dev tun-ffw-w07
ip tunnel change tun-ffw-w07 ttl 64
ip link set mtu 1400 dev tun-ffw-w07
ip link set dev tun-ffw-w07 up

ip rule add from <some_ISP_IP>/31 table 16
ip rule add iif bat0 table 16
ip rule add from all to <some_ISP_IP_for_this_machine> lookup 16 

ip route add default via <some_ISP_IP_on_the_other_side> \
	dev tun-ffw-w07 table 16
ip route add <some_ISP_IP>/31 dev tun-ffw-w07 table 16

# bat doesn't need any address, but the error occurs also with scope
# link
ip addr flush dev fastd0

iptables -t nat \
	-A POSTROUTING \
	-o tun-ffw-w07 ! -s <some_ISP_IP>/31 \
	-j SNAT --to <some_ISP_IP_for_this_machine>
iptables -A FORWARD -p tcp \
	--tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
# yes, I know … but some services in the net do not like IMCP
# http://lartc.org/howto/lartc.cookbook.mtu-mss.html

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0

/etc/local.d/kdump.start
/etc/init.d/dhcpd restart
/etc/init.d/vnstatd restart
/etc/init.d/named restart
/etc/init.d/apache2 restart
batctl gw server 100000/100000
#---8<---8<---8<---8<---8<---8<----EOF

Now we have to wait till “prime time” or weekend. I always hoped:
“please don't crush” but now it's different ;-) I hope after that you
can reproduce the bug and fix it.

Best regards

Philipp
________________________
Freifunk Rheinland e. V.
– Funkzelle Wuppertal –

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2014-11-20 12:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-18 21:58 [B.A.T.M.A.N.] kernel BUG at net/core/skbuff.c:100 Philipp Psurek
2014-11-20  8:32 ` Martin Hundebøll
2014-11-20  9:48   ` Philipp Psurek
2014-11-20 10:27     ` Martin Hundebøll
2014-11-20 12:22       ` Philipp Psurek [this message]
2014-11-20 12:36         ` Martin Hundebøll
2014-11-21  8:40           ` Philipp Psurek
2014-11-22 20:39           ` Philipp Psurek
2014-11-24  8:24             ` Martin Hundebøll
2014-11-24 10:44               ` Philipp Psurek
2014-11-24 12:14                 ` Philipp Psurek
2014-11-24 21:15                   ` Philipp Psurek
2014-11-24 22:26                     ` Philipp Psurek
2014-11-25  0:22                       ` Philipp Psurek
2014-11-25 10:17                         ` Philipp Psurek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1416486149.2747.9.camel@gmail.com \
    --to=philipp.psurek@gmail.com \
    --cc=b.a.t.m.a.n@lists.open-mesh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.