Linux Netfilter discussions
 help / color / mirror / Atom feed
* Lost packets, un-masqueraded retransmits
@ 2005-09-01 22:07 Phil Radden
  2005-09-02  7:31 ` Grant Taylor
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Radden @ 2005-09-01 22:07 UTC (permalink / raw)
  To: netfilter

Hi all,

I'm seeing some very strange problems which I believe are at least half 
netfilter issues - my apologies if not!  I hope you can help me.

I have a Fedora Core 4 system (2.6.12-ish, iptables 1.3.0-ish) acting as a 
masquerading gateway between an ethernet network (MTU 1500) and a PPPoATM 
connection (MTU 1492).  
I've simplified things to a single iptables rule,
  -t nat -A POSTROUTING -o ppp0 -j MASQUERADE
with net.ipv4.ip_forward=1, and can reproduce the problem.

Machines on the internal network are unable to access certain websites (the 
homepage of www.nationalrail.co.uk, for example).  It appears that large 
packets are being dropped, but also, whilst snooping to debug, I saw 
unmasqueraded packets being leaked out.

Snooping on the LAN address of the gateway I see (eg.):

No.     Time        Source                Destination           Protocol Info
      1 0.000000    192.168.31.191        213.174.202.41        TCP      1851 > http [SYN] Seq=0 Ack=0 Win=65535 Len=0 MSS=1460
      2 0.018139    213.174.202.41        192.168.31.191        TCP      http > 1851 [SYN, ACK] Seq=0 Ack=1 Win=32767 Len=0 MSS=16856
      3 0.018416    192.168.31.191        213.174.202.41        TCP      1851 > http [ACK] Seq=1 Ack=1 Win=65535 Len=0
      4 0.020234    192.168.31.191        213.174.202.41        HTTP     GET /live/styles/main/home.css HTTP/1.1
      5 0.051078    213.174.202.41        192.168.31.191        TCP      http > 1851 [ACK] Seq=1 Ack=408 Win=32767 Len=0
      6 0.054043    213.174.202.41        192.168.31.191        HTTP     Continuation or non-HTTP traffic
      7 0.054246    192.168.31.191        213.174.202.41        TCP      [TCP Dup ACK 3#1] 1851 > http [ACK] Seq=408 Ack=1 Win=65535 Len=0 SLE=1461 SRE=1674
     10 16.051440   192.168.31.191        213.174.202.41        TCP      1845 > http [FIN, ACK] Seq=0 Ack=0 Win=65535 Len=0
     11 54.377013   192.168.31.191        213.174.202.41        TCP      [TCP Retransmission] 1845 > http [FIN, ACK] Seq=0 Ack=0 Win=65535 Len=0

(packet 6 was seq 1461, ack 408, len 213).  As you can see, a 1460-byte 
packet went missing between packets 5 and 6.

Snooping on ppp0, I see nearly the same thing, with the internal IP address 
replaced by the external IP (correctly, by the masquerading rule).  Once 
again, the 1460-byte packet is missing.  However, and worryingly, the 
retransmitted packet, packet 11, and subsequent retransmissions, _appear_ 
(according to tcpdump and ethereal) to still have the original (RFC1918) 
source address.  Is this a config error, a reporting error or a bug? :)

Snooping at the endpoint (213.174.202.41 in this example) showed:

21:34:43.491939 85.210.143.231.1851 > 213.174.202.41.http: S 2346403573:2346403573(0) win 65535 <mss 1460,nop,nop,sackOK> (DF)
21:34:43.492005 213.174.202.41.http > 213.174.202.41.1851: S 4283534505:4283534505(0) ack 2346403574 win 32767 <mss 16856,nop,nop,sackOK> (DF)
21:34:43.509632 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 (DF)
21:34:43.524462 85.210.143.231.1851 > 213.174.202.41.http: P 1:408(407) ack 1 win 65535 (DF)
21:34:43.524484 213.174.202.41.http > 85.210.143.231.1851: . ack 408 win 32767 (DF)
21:34:43.525913 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
21:34:43.525935 213.174.202.41.http > 85.210.143.231.1851: P 1461:1674(213) ack 408 win 32767 (DF)
21:34:43.544692 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 <nop,nop,sack sack 1 {1461:1674} > (DF)
21:34:46.528362 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
21:34:52.527773 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
21:35:04.526655 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
21:35:28.524181 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)

Everything matches up, and you can see the 1460-byte packets going out, 
which never make it.  But the reason I think this part of the problem is 
potentially netfilter related, rather than a connection configuration 
issue, is that the gateway box itself can access all of these affected site 
with no problem at all!

I'd be grateful for any suggestions, I've currently run out of ideas of 
things to try...

Thanks,
Phil



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Lost packets, un-masqueraded retransmits
  2005-09-01 22:07 Lost packets, un-masqueraded retransmits Phil Radden
@ 2005-09-02  7:31 ` Grant Taylor
  2005-09-02 11:55   ` Phil Radden
  0 siblings, 1 reply; 4+ messages in thread
From: Grant Taylor @ 2005-09-02  7:31 UTC (permalink / raw)
  To: netfilter

> I'm seeing some very strange problems which I believe are at least half 
> netfilter issues - my apologies if not!  I hope you can help me.
> 
> I have a Fedora Core 4 system (2.6.12-ish, iptables 1.3.0-ish) acting as a 
> masquerading gateway between an ethernet network (MTU 1500) and a PPPoATM 
> connection (MTU 1492).  
> I've simplified things to a single iptables rule,
>   -t nat -A POSTROUTING -o ppp0 -j MASQUERADE
> with net.ipv4.ip_forward=1, and can reproduce the problem.
> 
> Machines on the internal network are unable to access certain websites (the 
> homepage of www.nationalrail.co.uk, for example).  It appears that large 
> packets are being dropped, but also, whilst snooping to debug, I saw 
> unmasqueraded packets being leaked out.
> 
> Snooping on the LAN address of the gateway I see (eg.):
> 
> No.     Time        Source                Destination           Protocol Info
>       1 0.000000    192.168.31.191        213.174.202.41        TCP      1851 > http [SYN] Seq=0 Ack=0 Win=65535 Len=0 MSS=1460
>       2 0.018139    213.174.202.41        192.168.31.191        TCP      http > 1851 [SYN, ACK] Seq=0 Ack=1 Win=32767 Len=0 MSS=16856
>       3 0.018416    192.168.31.191        213.174.202.41        TCP      1851 > http [ACK] Seq=1 Ack=1 Win=65535 Len=0
>       4 0.020234    192.168.31.191        213.174.202.41        HTTP     GET /live/styles/main/home.css HTTP/1.1
>       5 0.051078    213.174.202.41        192.168.31.191        TCP      http > 1851 [ACK] Seq=1 Ack=408 Win=32767 Len=0
>       6 0.054043    213.174.202.41        192.168.31.191        HTTP     Continuation or non-HTTP traffic
>       7 0.054246    192.168.31.191        213.174.202.41        TCP      [TCP Dup ACK 3#1] 1851 > http [ACK] Seq=408 Ack=1 Win=65535 Len=0 SLE=1461 SRE=1674
>      10 16.051440   192.168.31.191        213.174.202.41        TCP      1845 > http [FIN, ACK] Seq=0 Ack=0 Win=65535 Len=0
>      11 54.377013   192.168.31.191        213.174.202.41        TCP      [TCP Retransmission] 1845 > http [FIN, ACK] Seq=0 Ack=0 Win=65535 Len=0
> 
> (packet 6 was seq 1461, ack 408, len 213).  As you can see, a 1460-byte 
> packet went missing between packets 5 and 6.
> 
> Snooping on ppp0, I see nearly the same thing, with the internal IP address 
> replaced by the external IP (correctly, by the masquerading rule).  Once 
> again, the 1460-byte packet is missing.  However, and worryingly, the 
> retransmitted packet, packet 11, and subsequent retransmissions, _appear_ 
> (according to tcpdump and ethereal) to still have the original (RFC1918) 
> source address.  Is this a config error, a reporting error or a bug? :)
> 
> Snooping at the endpoint (213.174.202.41 in this example) showed:
> 
> 21:34:43.491939 85.210.143.231.1851 > 213.174.202.41.http: S 2346403573:2346403573(0) win 65535 <mss 1460,nop,nop,sackOK> (DF)
> 21:34:43.492005 213.174.202.41.http > 213.174.202.41.1851: S 4283534505:4283534505(0) ack 2346403574 win 32767 <mss 16856,nop,nop,sackOK> (DF)
> 21:34:43.509632 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 (DF)
> 21:34:43.524462 85.210.143.231.1851 > 213.174.202.41.http: P 1:408(407) ack 1 win 65535 (DF)
> 21:34:43.524484 213.174.202.41.http > 85.210.143.231.1851: . ack 408 win 32767 (DF)
> 21:34:43.525913 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
> 21:34:43.525935 213.174.202.41.http > 85.210.143.231.1851: P 1461:1674(213) ack 408 win 32767 (DF)
> 21:34:43.544692 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 <nop,nop,sack sack 1 {1461:1674} > (DF)
> 21:34:46.528362 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
> 21:34:52.527773 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
> 21:35:04.526655 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
> 21:35:28.524181 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
> 
> Everything matches up, and you can see the 1460-byte packets going out, 
> which never make it.  But the reason I think this part of the problem is 
> potentially netfilter related, rather than a connection configuration 
> issue, is that the gateway box itself can access all of these affected site 
> with no problem at all!
> 
> I'd be grateful for any suggestions, I've currently run out of ideas of 
> things to try...

The first thing that comes to mind is (I know that I have said this about 3 or 4 times in the last week to various people on and off this list) that your Path MTU value is killing things.  If you notice in the 2nd traffic dump (tcpdump output) the Don't Fragment (DF) flag is set on the packet.  I would be willing to bet that the returning traffic is being dropped b/c the packets are too large to fit in the frame of the PPP over ATM frame.

21:34:43.491939 85.210.143.231.1851 > 213.174.202.41.http: S 2346403573:2346403573(0) win 65535 <mss 1460,nop,nop,sackOK> (DF)
21:34:43.492005 213.174.202.41.http > 213.174.202.41.1851: S 4283534505:4283534505(0) ack 2346403574 win 32767 <mss 16856,nop,nop,sackOK> (DF)
21:34:43.509632 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 (DF)
21:34:43.524462 85.210.143.231.1851 > 213.174.202.41.http: P 1:408(407) ack 1 win 65535 (DF)
21:34:43.524484 213.174.202.41.http > 85.210.143.231.1851: . ack 408 win 32767 (DF)
21:34:43.525913 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
21:34:43.525935 213.174.202.41.http > 85.210.143.231.1851: P 1461:1674(213) ack 408 win 32767 (DF)
21:34:43.544692 85.210.143.231.1851 > 213.174.202.41.http: . ack 1 win 65535 <nop,nop,sack sack 1 {1461:1674} > (DF)

If you notice the next 4 packets are duplications of the 3rd packet from the server to the client.

21:34:46.528362 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
--------------- This packet is approximately 3 seconds after the 1st transmission.
21:34:52.527773 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
--------------- This packet is approximately 6 seconds after the 2nd transmission.
21:35:04.526655 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
--------------- This packet is approximately 12 seconds after the 3rd transmission.
21:35:28.524181 213.174.202.41.http > 85.210.143.231.1851: . 1:1461(1460) ack 408 win 32767 (DF)
--------------- This packet is approximately 24 seconds after the 4th transmission.

If you notice in your tcpdump output from the server  5 packets are identical to each other.  These packets also have the DF flag set.  If you will notice the timestamps between the last 4 packets from the server is doubling in delay between each packet.  This is as if the server is going through it's TCP/IP backout timer assuming that things are congested on the way back to the client.

The ""missing packet you are referring to is a fairly small packet and could easily be a datagram that exceeded the TCP/IP window size of the packet and thus was forced to wrap it the remaining data in to a subsequent packet.  Seeing as how the DF flag is set I could see how things would get lost on their way back to the client.

I would like to see a TCPDump output from the client system verses the dump that you did provide as comparing apples to apples is much easier.

As far as a solution to you try adding a TCPMSS target to your FORWARD chain as such:

iptables -t filter -A FORWARD -j TCPMSS --clamp-mss-to-pmtu



Grant. . . .


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Lost packets, un-masqueraded retransmits
  2005-09-02  7:31 ` Grant Taylor
@ 2005-09-02 11:55   ` Phil Radden
  2005-09-03  6:43     ` Grant Taylor
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Radden @ 2005-09-02 11:55 UTC (permalink / raw)
  To: netfilter

On Fri, 2 Sep 2005, Grant Taylor wrote:
> The first thing that comes to mind is (I know that I have said this about
> 3 or 4 times in the last week to various people on and off this list) that
> your Path MTU value is killing things.  If you notice in the 2nd traffic
> dump (tcpdump output) the Don't Fragment (DF) flag is set on the packet.  
> I would be willing to bet that the returning traffic is being dropped b/c
> the packets are too large to fit in the frame of the PPP over ATM frame.

Brilliant - thank you for your detailed response Grant, you're completely 
correct.  The ICMP errors are clearly being dropped somewhere outside my 
control, and the backing off of the retries sending the full-size packet 
fits perfectly.

Knowing now what the issue is, I can find the millions of web pages about 
the same problem - apologies for bothering you with such a FAQ!  The MSS 
clamp works perfectly.

Any thoughts regarding the 'leaking packets', reaching the external network 
without getting masqueraded?  I'm monitoring now to see if I still get them, 
which I possibly won't since they all seemed to be retransmits - I'll post 
again if they're still around.  But here's the detail of one such packet, 
grabbed from ppp0 on the gateway:

0000  00 04 02 00 00 00 00 00 00 00 00 00 00 00 08 00   ................
0010  45 00 00 28 4e 16 40 00 7f 06 21 f8 c0 a8 1f bf   E..(N.@...!.....
0020  c2 42 e9 17 06 01 00 50 6b ba 7a 19 88 1d 82 de   .B.....Pk.z.....
0030  50 11 ff ff 2c f1 00 00                           P...,...

[unpacking details from ethereal...]
Frame 6655 (56 bytes on wire, 56 bytes captured)
    Arrival Time: Sep  2, 2005 12:31:01.442977000
    Time delta from previous packet: 0.300389000 seconds
    Time since reference or first frame: 882.838297000 seconds
    Frame Number: 6655
    Packet Length: 56 bytes
    Capture Length: 56 bytes
    Protocols in frame: sll:ip:tcp
Linux cooked capture
    Packet type: Sent by us (4)
    Link-layer address type: 512
    Link-layer address length: 0
    Source: <MISSING>
    Protocol: IP (0x0800)
Internet Protocol, Src: 192.168.31.191 (192.168.31.191), Dst: 194.66.233.23 (194.66.233.23)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 40
    Identification: 0x4e16 (19990)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 127
    Protocol: TCP (0x06)
    Header checksum: 0x21f8 [correct]
    Source: 192.168.31.191 (192.168.31.191)
    Destination: 194.66.233.23 (194.66.233.23)
Transmission Control Protocol, Src Port: 1537 (1537), Dst Port: http (80), Seq: 0, Ack: 0, Len: 0
    Source port: 1537 (1537)
    Destination port: http (80)
    Sequence number: 0    (relative sequence number)
    Acknowledgement number: 0    (relative ack number)
    Header length: 20 bytes
    Flags: 0x0011 (FIN, ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgment: Set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...1 = Fin: Set
    Window size: 65535
    Checksum: 0x2cf1 [correct]
    SEQ/ACK analysis
        TCP Analysis Flags
            This frame is a (suspected) retransmission
            The RTO for this segment was: 0.600510000 seconds
            RTO based on delta from frame: 6653


Many thanks,
Phil


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Lost packets, un-masqueraded retransmits
  2005-09-02 11:55   ` Phil Radden
@ 2005-09-03  6:43     ` Grant Taylor
  0 siblings, 0 replies; 4+ messages in thread
From: Grant Taylor @ 2005-09-03  6:43 UTC (permalink / raw)
  To: netfilter

> Brilliant - thank you for your detailed response Grant, you're completely 
> correct.  The ICMP errors are clearly being dropped somewhere outside my 
> control, and the backing off of the retries sending the full-size packet 
> fits perfectly.

Good.

> Knowing now what the issue is, I can find the millions of web pages about 
> the same problem - apologies for bothering you with such a FAQ!  The MSS 
> clamp works perfectly.

Nonsense!  You were not a bother at all.  This mail list is for exactly these types of questions.  IMHO the Linux / Unix community is nothing if we do not try to help each other but instead compete for each and every boxen to administer like so many of the Windows admins that I know.  :(

> Any thoughts regarding the 'leaking packets', reaching the external network 
> without getting masqueraded?  I'm monitoring now to see if I still get them, 
> which I possibly won't since they all seemed to be retransmits - I'll post 
> again if they're still around.  But here's the detail of one such packet, 
> grabbed from ppp0 on the gateway:

Well possibly.  You are NATing in the nat table POSTROUTING chain as you should be.  However I believe that packets only traverse the nat table if they are considered NEW.  (I'm not sure how this works (for the rest of the packets, conntrack?) but I believe it to be the case.)  Thus if the portion of the NATing / MASQUERADEing code does not alter the source IP for packets that it has already seen one time I could see how your unaltered packets are getting out.  Thus I would wonder if you could not put a rule in your filter table OUTPUT chain to only allow packets out your internet interface if they have a source address of your globally routable IP.

I do think this is an issue that should be further documented and brought to the attention of the Net Filter Developers to see what they think of it.  Unfortunetly I am not one of the devls and thus can not help you much more than that.

> 0000  00 04 02 00 00 00 00 00 00 00 00 00 00 00 08 00   ................
> 0010  45 00 00 28 4e 16 40 00 7f 06 21 f8 c0 a8 1f bf   E..(N.@...!.....
> 0020  c2 42 e9 17 06 01 00 50 6b ba 7a 19 88 1d 82 de   .B.....Pk.z.....
> 0030  50 11 ff ff 2c f1 00 00                           P...,...
> 
> [unpacking details from ethereal...]
> Frame 6655 (56 bytes on wire, 56 bytes captured)
>     Arrival Time: Sep  2, 2005 12:31:01.442977000
>     Time delta from previous packet: 0.300389000 seconds
>     Time since reference or first frame: 882.838297000 seconds
>     Frame Number: 6655
>     Packet Length: 56 bytes
>     Capture Length: 56 bytes
>     Protocols in frame: sll:ip:tcp
> Linux cooked capture
>     Packet type: Sent by us (4)
>     Link-layer address type: 512
>     Link-layer address length: 0
>     Source: <MISSING>
>     Protocol: IP (0x0800)
> Internet Protocol, Src: 192.168.31.191 (192.168.31.191), Dst: 194.66.233.23 (194.66.233.23)
>     Version: 4
>     Header length: 20 bytes
>     Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
>         0000 00.. = Differentiated Services Codepoint: Default (0x00)
>         .... ..0. = ECN-Capable Transport (ECT): 0
>         .... ...0 = ECN-CE: 0
>     Total Length: 40
>     Identification: 0x4e16 (19990)
>     Flags: 0x04 (Don't Fragment)
>         0... = Reserved bit: Not set
>         .1.. = Don't fragment: Set
>         ..0. = More fragments: Not set
>     Fragment offset: 0
>     Time to live: 127
>     Protocol: TCP (0x06)
>     Header checksum: 0x21f8 [correct]
>     Source: 192.168.31.191 (192.168.31.191)
>     Destination: 194.66.233.23 (194.66.233.23)
> Transmission Control Protocol, Src Port: 1537 (1537), Dst Port: http (80), Seq: 0, Ack: 0, Len: 0
>     Source port: 1537 (1537)
>     Destination port: http (80)
>     Sequence number: 0    (relative sequence number)
>     Acknowledgement number: 0    (relative ack number)
>     Header length: 20 bytes
>     Flags: 0x0011 (FIN, ACK)
>         0... .... = Congestion Window Reduced (CWR): Not set
>         .0.. .... = ECN-Echo: Not set
>         ..0. .... = Urgent: Not set
>         ...1 .... = Acknowledgment: Set
>         .... 0... = Push: Not set
>         .... .0.. = Reset: Not set
>         .... ..0. = Syn: Not set
>         .... ...1 = Fin: Set
>     Window size: 65535
>     Checksum: 0x2cf1 [correct]
>     SEQ/ACK analysis
>         TCP Analysis Flags
>             This frame is a (suspected) retransmission
>             The RTO for this segment was: 0.600510000 seconds
>             RTO based on delta from frame: 6653

Wow!  I have not seen so much detail on a packet in a LONG time.  At this moment I'm too out of it (I've been fighting a bug) to make much sense out of it.  Looking it over any way the only thing that I do see is the line that states "...This frame is a (suspected) retransmission... ...RTO based on delta from frame: 6653...".



Grant. . . .


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-09-03  6:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-01 22:07 Lost packets, un-masqueraded retransmits Phil Radden
2005-09-02  7:31 ` Grant Taylor
2005-09-02 11:55   ` Phil Radden
2005-09-03  6:43     ` Grant Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox