Networking messed up, bad checksum, incorrect length

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Networking messed up, bad checksum, incorrect length
@ 2006-10-27  1:53 Carlos Velasco
  2006-10-27  2:51 ` Carlos Velasco
  2006-10-27  3:37 ` Herbert Xu
  0 siblings, 2 replies; 8+ messages in thread
From: Carlos Velasco @ 2006-10-27  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-net

Hello,

Before all, sorry for the long mail, but I have stepped into an
intricate problem.

I'm using kernel 2.6.18.1 over Intel EMT64.
Linux Merak 2.6.18.1 #1 SMP Thu Oct 26 16:19:26 CEST 2006 x86_64 x86_64
x86_64 GNU/Linux

I have a really bad behavior in networking. I would appreciate anyone
that can help.

So first of all, I will begin the story from the beginning.

I saw some email sending related problems and tracked down the problem
to Netfilter dropping some packets.

As you can see, all of them are ACK packets in reply to sent packets
related with an email in send process through SMTP:

Oct 26 18:58:04 kern:warning FIREWALL:INPUT IN=eth0 OUT=
MAC=00:13:21:cc:54:a6:00:0d:bc:a0:e2:82:08:00 SRC=193.147.150.12
DST=192.168.128.182 LEN=64 TOS=0x00 PREC=0x00 TTL=54 ID=31520 DF
PROTO=TCP SPT=25 DPT=60596 WINDOW=32448 RES=0x00 ACK URGP=0

Oct 26 18:58:04 kern:warning FIREWALL:INPUT IN=eth0 OUT=
MAC=00:13:21:cc:54:a6:00:0d:bc:a0:e2:82:08:00 SRC=193.147.150.12
DST=192.168.128.182 LEN=64 TOS=0x00 PREC=0x00 TTL=54 ID=31521 DF
PROTO=TCP SPT=25 DPT=60596 WINDOW=32448 RES=0x00 ACK URGP=0


For tracking down the problem, I opened the firewall and all worked fine.

So, I went to track the real problem, and the first thing was to take
sniffer traces, and here's the surprise.

Packets going to destination SMTP server
========================================
# tcpdump -n -e dst flash.cnio.es
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
03:12:15.878929 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 74: 192.168.128.182.59061 > 193.147.150.12.25: S
508337957:508337957(0) win 5040 <mss 1260,sackOK,timestamp 9642740
0,nop,wscale 7>
03:12:15.927479 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 66: 192.168.128.182.59061 > 193.147.150.12.25: . ack
1032773455 win 40 <nop,nop,timestamp 9642752 425655092>
03:12:16.000764 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 66: 192.168.128.182.59061 > 193.147.150.12.25: . ack 39
win 40 <nop,nop,timestamp 9642771 425655100>
03:12:16.000811 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 95: 192.168.128.182.59061 > 193.147.150.12.25: P
0:29(29) ack 39 win 40 <nop,nop,timestamp 9642771 425655100>
03:12:16.046904 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 183: 192.168.128.182.59061 > 193.147.150.12.25: P
29:146(117) ack 163 win 40 <nop,nop,timestamp 9642782 425655104>
03:12:21.500477 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 2562: 192.168.128.182.59061 > 193.147.150.12.25: .
146:2642(2496) ack 228 win 40 <nop,nop,timestamp 9644145 425655650>

LENGTH:2562 !!

But it get worse later and later...

03:12:22.185669 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 10050: 192.168.128.182.59061 > 193.147.150.12.25: .
52562:62546(9984) ack 228 win 40 <nop,nop,timestamp 9644317 425655718>

LENGTH: 10050 !!

03:12:22.345090 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 6306: 192.168.128.182.59061 > 193.147.150.12.25: .
62546:68786(6240) ack 228 win 40 <nop,nop,timestamp 9644357 425655734>

LENGTH: 6306 !!

03:12:22.484863 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
(0x0800), length 12546: 192.168.128.182.59061 > 193.147.150.12.25: .
68786:81266(12480) ack 228 win 40 <nop,nop,timestamp 9644391 425655748>

LENGTH: 12546 !!

How can be this? The MTU is set to 1300 in the interface:

# ip link show
1: lo: <LOOPBACK,UP,10000> mtu 1300 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1300 qdisc pfifo_fast qlen 1000
    link/ether 00:13:21:cc:54:a6 brd ff:ff:ff:ff:ff:ff
3: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0


And also in the default route:

# ip route show
192.168.128.0/24 dev eth0  proto kernel  scope link  src 192.168.128.182
default via 192.168.128.200 dev eth0  mtu 1300


But still worse, I wrote the full sniffer traces to a file for further
analysis with Ethereal, and there I see that not only these long packets
are above MTU, all they have bad TCP checksums.

But why these packets are traversing the network when I open the
firewall? With that length it's impossible to go through Internet
without being dropped. This can't be real.

So I used port mirroring in the switch attached to this server and took
sniffer traces of the ethernet port to see if tcpdump/libpcap are messed
up or something.... well, the REAL packets are all up to 1300 bytes
length (IP), so this means that sniffer traces taken in the server are
NOT the real packets that go through the network interface.

===
Analysis:
My thinking is that libpcap is taking the packets before being
fragmented in kernel to meet the MTU limits and the same goes for
Netfilter, if netfilter doesn't see the ACKs as RELATED to the packets
sent, they get dropped and that is exactly that I see.
===

Some more info about versions and configurations:

tcpdump version 3.9.5
libpcap version 0.9.5

eth0: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCI Express)
10/100/1000BaseT Ethernet 00:13:21:cc:54:a6
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1]
TSOcap[1]
eth0: dma_rwctrl[76180000] dma_mask[64-bit]

# ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  Not reported
        Advertised auto-negotiation: No
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes

# sysctl -a | grep -i mtu
net.ipv6.conf.default.mtu = 1280
net.ipv6.conf.all.mtu = 1280
net.ipv6.conf.eth0.mtu = 1300
net.ipv6.conf.lo.mtu = 1300
error: "Operation not permitted" reading key "net.ipv6.route.flush"
net.ipv6.route.mtu_expires = 600
error: "Operation not permitted" reading key "net.ipv4.route.flush"
net.ipv4.tcp_mtu_probing = 0
net.ipv4.route.min_pmtu = 552
net.ipv4.route.mtu_expires = 600
net.ipv4.ip_no_pmtu_disc = 0


I have uploaded sniffer traces taken to the following urls.

Sniffer traces taken on the server
(tcpdump -s 0 -w smtptrace.merak.pcap host flash.cnio.es):
http://www.nimastelecom.com/smtptraces/smtptrace.merak.pcap

Sniffer traces taken through port mirroring in the switch
(ethereal with filter: host flash.cnio.es):
http://www.nimastelecom.com/smtptraces/smtptrace_portmirror.pcap


Any help will be really apreciated into this problem.

Regards,
Carlos Velasco
CCNP & CCDP Cisco Certified Network Professional.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  1:53 Networking messed up, bad checksum, incorrect length Carlos Velasco
@ 2006-10-27  2:51 ` Carlos Velasco
  2006-10-27  3:37 ` Herbert Xu
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos Velasco @ 2006-10-27  2:51 UTC (permalink / raw)
  To: linux-kernel, linux-net

An Update.

When using kernel 2.6.16.29, I don't see the problem. Packets taken with
tcpdump/libpcap are the real packets.

Linux Merak 2.6.16.29 #1 SMP Fri Oct 27 04:33:56 CEST 2006 x86_64 x86_64
x86_64 GNU/Linux

Regards,
Carlos Velasco
CCNP & CCDP Cisco Certified Network Professional


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  1:53 Networking messed up, bad checksum, incorrect length Carlos Velasco
  2006-10-27  2:51 ` Carlos Velasco
@ 2006-10-27  3:37 ` Herbert Xu
  2006-10-27  3:49   ` Carlos Velasco
  1 sibling, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-10-27  3:37 UTC (permalink / raw)
  To: Carlos Velasco; +Cc: linux-kernel, linux-net

Carlos Velasco <lkml@newipnet.com> wrote:
> 
> 03:12:22.484863 00:13:21:cc:54:a6 > 00:0d:bc:a0:e2:82, ethertype IPv4
> (0x0800), length 12546: 192.168.128.182.59061 > 193.147.150.12.25: .
> 68786:81266(12480) ack 228 win 40 <nop,nop,timestamp 9644391 425655748>
> 
> LENGTH: 12546 !!

> But still worse, I wrote the full sniffer traces to a file for further
> analysis with Ethereal, and there I see that not only these long packets
> are above MTU, all they have bad TCP checksums.

These packets look like normal TSO packets.  Linux will send a
packet containing more data than fits in a packet to the NIC.  The
NIC will then segment the packet for us.

In order to see what really goes out, you'll need to run a packet
dump beyond the NIC.

If that is not possible, you can try disabling TSO with ethtool -K.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  3:37 ` Herbert Xu
@ 2006-10-27  3:49   ` Carlos Velasco
  2006-10-27  4:00     ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos Velasco @ 2006-10-27  3:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-kernel, linux-net

Herbert Xu escribió:

> These packets look like normal TSO packets.  Linux will send a
> packet containing more data than fits in a packet to the NIC.  The
> NIC will then segment the packet for us.
> 
> If that is not possible, you can try disabling TSO with ethtool -K.

I see...
I have tracked the change to 2.6.18.
Using kernel 2.6.17.14 and below makes tcpdump/libpcap to display the
real packets.
Could this be a change in the tg3 driver to support TSO committed in 2.6.18?

And a question... how this TSO affects netfilter?

> In order to see what really goes out, you'll need to run a packet
> dump beyond the NIC.

They are here:

Sniffer traces taken through port mirroring in the switch
(ethereal with filter: host flash.cnio.es):
http://www.nimastelecom.com/smtptraces/smtptrace_portmirror.pcap

But still wondering how could this affect Netfilter... my real problem
is that Netfilter drops ACK packets because it doesn't see the
connection established/related.

I will try to do some more research.

Regards,
Carlos Velasco
CCNP & CCDP Cisco Certified Network Professional

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  3:49   ` Carlos Velasco
@ 2006-10-27  4:00     ` Herbert Xu
  2006-10-27  4:03       ` Carlos Velasco
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-10-27  4:00 UTC (permalink / raw)
  To: Carlos Velasco; +Cc: herbert, linux-kernel, linux-net

Carlos Velasco <lkml@newipnet.com> wrote:
> 
> And a question... how this TSO affects netfilter?

netfilter will see the jumbo frames rather than the individual packets.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  4:00     ` Herbert Xu
@ 2006-10-27  4:03       ` Carlos Velasco
  2006-10-27  4:31         ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos Velasco @ 2006-10-27  4:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-kernel, linux-net

Herbert Xu escribió:

> netfilter will see the jumbo frames rather than the individual packets.

I wonder if this could affect the TCP stateful tracking of netfilter
making the ACK drops I see.

Regards,
Carlos Velasco
CCNP & CCDP Cisco Certified Network Professional


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  4:03       ` Carlos Velasco
@ 2006-10-27  4:31         ` Herbert Xu
  2006-10-29 22:40           ` Carlos Velasco
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-10-27  4:31 UTC (permalink / raw)
  To: Carlos Velasco; +Cc: herbert, linux-kernel, linux-net

Carlos Velasco <lkml@newipnet.com> wrote:
> 
>> netfilter will see the jumbo frames rather than the individual packets.
> 
> I wonder if this could affect the TCP stateful tracking of netfilter
> making the ACK drops I see.

It shouldn't if netfilter is behaving correctly.  The whole point of
TSO is that the packet is an otherwise valid TCP/IP packet.  The only
thing special about it is that it exceeds the MTU of the output device.

That's what allows us to implement TSO without touching all parts of
the networking stack.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Networking messed up, bad checksum, incorrect length
  2006-10-27  4:31         ` Herbert Xu
@ 2006-10-29 22:40           ` Carlos Velasco
  0 siblings, 0 replies; 8+ messages in thread
From: Carlos Velasco @ 2006-10-29 22:40 UTC (permalink / raw)
  To: linux-kernel, linux-net

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

Following with this issue, disabling TSO does _not_ solve the problem in
the lost of connection in an outbound email.

It seems like a bug in Netfilter, not sure how to debug TCP conntrack,
but it seems as if Netfilter lost the Established state for this connection.

Looking at the traces, the only strange thing is that using
tcpdump/libpcap in the server I see packets being sent before the
previuos ACK is received, but I think this can't be posible, as the ACK
number of this packet is needed before sending the next.

I'm attaching the last traces taken with tso disabled.

smtptrace_flash1_bad2.pcap <- packets taken in the destination server
smtptrace_merak_bad2.pcap <- packets tanken in the source server (the
one with problem).

Analyzing the traces, it seems that a data packet is lost in transit to
destination server, so the destination server send DUP ACKs trying to
force the source server to resend this lost packet. But Netfilter is
filtering these DUP ACKs (showed in the kernel logs) and so, the source
server give up and reset the connection.

I think this to be a bug in Netfilter code. Again, any help would be
apreciated.

Regards,
Carlos Velasco
CCNP & CCDP Cisco Certified Network Professional

[-- Attachment #2: smtptrace_merak_bad2.pcap --]
[-- Type: application/octet-stream, Size: 5503 bytes --]

[-- Attachment #3: smtptrace_flash1_bad2.pcap --]
[-- Type: application/octet-stream, Size: 6047 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-10-29 22:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-27  1:53 Networking messed up, bad checksum, incorrect length Carlos Velasco
2006-10-27  2:51 ` Carlos Velasco
2006-10-27  3:37 ` Herbert Xu
2006-10-27  3:49   ` Carlos Velasco
2006-10-27  4:00     ` Herbert Xu
2006-10-27  4:03       ` Carlos Velasco
2006-10-27  4:31         ` Herbert Xu
2006-10-29 22:40           ` Carlos Velasco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox