All of lore.kernel.org
 help / color / mirror / Atom feed
* vhost+vlan CHECKSUM_PARTIAL bug
@ 2014-06-28 10:14 Bernhard M. Wiedemann
  0 siblings, 0 replies; only message in thread
From: Bernhard M. Wiedemann @ 2014-06-28 10:14 UTC (permalink / raw)
  To: netdev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I tracked down and built a reproducer for a rather hidden but ugly
bug, that we hit by accident.
It occurs, when using vhost/tap and VLAN for sending TCP into a KVM VM
which then decides to forward it (using DNAT+SNAT in my reproducer) on
an interface that can not carry on the CHECKSUM_PARTIAL optimization
bit, so it has to calculate the TCP checksum,
but happens to write it 4 bytes early into the outgoing packet.
And 4 bytes is the offset added by the VLAN header.

I built me a vanilla 3.16.0-rc2+ for testing with an added patch
that you can find in
http://www.zq1.de/~bernhard/temp/bnc884706/
along with pcaps before and after being screwed
by the CHECKSUM_PARTIAL logic


here is the reproducer:


# be in a directory with 500MB free disk space
wget -nc http://www.zq1.de/~bernhard/temp/sp3-mini.qcow2
# VM login is root:linux
# iptables rules are in /usr/local/sbin/
echo '#!/bin/sh
t=$1;
vconfig add $t 300
ifconfig $t up
ifconfig $t.300 192.168.77.1/24
' >myifup77
chmod a+x myifup77
echo '#!/bin/sh
t=$1;
ifconfig $t 192.168.76.1/24
' >myifup76
chmod a+x myifup76

# as root:
qemu-kvm -drive file=sp3-mini.qcow2,if=virtio -net
nic,model=rtl8139,macaddr=52:54:00:12:34:33 -net
tap,script=myifup76,ifname=tap76 -daemonize -m 1000 -vnc :99 -netdev
type=tap,id=guest0,script=myifup77,vhost=on,ifname=tap77 -device
virtio-net-pci,netdev=guest0,mac=52:54:00:12:34:32
sleep 50 # wait for VM boot
/etc/init.d/apache2 start # or any other webserver
ethtool -K tap77 tx off
curl 192.168.77.2 # this should work
ethtool -K tap77 tx on
curl 192.168.77.2 # this fails here reproducibly


The debug output showed
on host: tun csum_start=34 csum_offset=16 headroom=230
on VM:   skb_checksum_help csum_start=30 csum_offset=16 headroom=68
partial=1

suggesting that either skb->csum_start or skb_headroom is off by 4
within the VM

I am not knowledgeable enough in linux net code to further debug or
fix this, thus asking you experts here.
Please help.

Ciao
Bernhard M.
- --
cloud software developer at SUSE
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlOulXEACgkQSTYLOx37oWQx7QCg074/0i+mkJnLRzG42T8zBifh
tTsAnRFzNonYeBCCpag1ZqR4DcfPV/46
=BBOd
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2014-06-28 10:20 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-28 10:14 vhost+vlan CHECKSUM_PARTIAL bug Bernhard M. Wiedemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.