netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug? GRE tunnel periodically won't transmit some packets
@ 2011-11-07 16:21 Chris Siebenmann
  2011-11-07 16:55 ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-07 16:21 UTC (permalink / raw)
  To: netdev; +Cc: cks

 I have a weird problem where a GRE tunnel periodically won't transmit
some (TCP) packets, while at the same time it will transmit others just
fine. This is happening in the current kernel.org git head kernel as
well as earlier ones.

 The networking environment is a GRE tunnel over IPSec in tunnel mode
('esp/tunnel/...') over a DSL PPPoE link. What I observe is that
periodically outbound SSH connections stall early in the protocol
negociation, and other TCP connections can similarly stall. Sometimes
they recover and sometimes they time out. The problem is pretty
reproducable and regular, although not constant (sometimes the affected
packets get through right away).

 I have tcpdump'd both the GRE tunnel device and the underlying DSL
PPPoE device and during a stall, the GRE tcpdump will show packets being
sent that do not appear on the DSL PPPoE link. All of the packets that
I've seen stalling have had 500 data octets.

Typical packets are:
	IP 128.100.3.52.52063 > 128.100.3.51.ssh: Flags [.], seq 22:522, ack 22, win 91, options [nop,nop,TS val 143020 ecr 966040433], length 500

(here 128.100.3.52 is the GRE tunnel IP address of the machine
experiencing problems)

or ttcp:
	IP 128.100.3.52.46585 > 128.100.3.51.5001: Flags [.], seq 1:501, ack 1, win 91, options [nop,nop,TS val 729200 ecr 979199256], length 500

Ttcp had a whole run of 'length 500' packets fail to go through. SSH
will actually successfully transmit later (different-length) packets,
eg:
	128.100.3.52.52063 > 128.100.3.51.ssh: Flags [P.], seq 522:926, ack 22, win 91, options [nop,nop,TS val 143037 ecr 966040450], length 404

 The DSL PPPoE link has an MTU of 1492 and the GRE tunnel has an MTU of
1200 (on both ends). As far as I can tell they do pass packets of this
size. *However*, on kernels that display this problem tracepath and 'ip
route show table cache' both report that the GRE tunnel has a path MTU
of 854 going from 128.100.3.52 to 128.100.3.51; however, 128.100.3.51
sees a pmtu of 1200 for the path to 128.100.3.52.

 The machine experiencing these problems is a 64-bit x86_64 Fedora 15
machine with various kernels. The problem does not happen with the
current Fedora 14 kernel (nominally 2.6.35.14); it does happen with the
Fedora 15 kernel ('2.6.40.6' aka some version of 3.0.0), the Fedora 16
kernel (some version of 3.1.0) and on the current kernel.org git head
as of last night. I am not running NetworkManager; all networking is
statically configured and not changing during operation, and my IPSec
setup is statically keyed[*].

 I would be happy to run any debugging tests or give any further
information that people want. Should I try a different kernel git
repo than Linus's kernel.org one?

 Thanks in advance.

(While I'm reading the mailing list I'm not directly subscribed to it,
so copying me on replies will make sure that I see them immediately.)

	- cks
[*: I'm aware that this is not ideal from a security perspective since
    it relies on me manually rekeying everything every so often.]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-07 16:21 Bug? GRE tunnel periodically won't transmit some packets Chris Siebenmann
@ 2011-11-07 16:55 ` Eric Dumazet
  2011-11-08  6:17   ` Chris Siebenmann
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-07 16:55 UTC (permalink / raw)
  To: Chris Siebenmann; +Cc: netdev

Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
> I have a weird problem where a GRE tunnel periodically won't transmit
> some (TCP) packets, while at the same time it will transmit others just
> fine. This is happening in the current kernel.org git head kernel as
> well as earlier ones.
> 
>  The networking environment is a GRE tunnel over IPSec in tunnel mode
> ('esp/tunnel/...') over a DSL PPPoE link. What I observe is that
> periodically outbound SSH connections stall early in the protocol
> negociation, and other TCP connections can similarly stall. Sometimes
> they recover and sometimes they time out. The problem is pretty
> reproducable and regular, although not constant (sometimes the affected
> packets get through right away).
> 
>  I have tcpdump'd both the GRE tunnel device and the underlying DSL
> PPPoE device and during a stall, the GRE tcpdump will show packets being
> sent that do not appear on the DSL PPPoE link. All of the packets that
> I've seen stalling have had 500 data octets.
> 
> Typical packets are:
> 	IP 128.100.3.52.52063 > 128.100.3.51.ssh: Flags [.], seq 22:522, ack 22, win 91, options [nop,nop,TS val 143020 ecr 966040433], length 500
> 
> (here 128.100.3.52 is the GRE tunnel IP address of the machine
> experiencing problems)
> 
> or ttcp:
> 	IP 128.100.3.52.46585 > 128.100.3.51.5001: Flags [.], seq 1:501, ack 1, win 91, options [nop,nop,TS val 729200 ecr 979199256], length 500
> 
> Ttcp had a whole run of 'length 500' packets fail to go through. SSH
> will actually successfully transmit later (different-length) packets,
> eg:
> 	128.100.3.52.52063 > 128.100.3.51.ssh: Flags [P.], seq 522:926, ack 22, win 91, options [nop,nop,TS val 143037 ecr 966040450], length 404
> 
>  The DSL PPPoE link has an MTU of 1492 and the GRE tunnel has an MTU of
> 1200 (on both ends). As far as I can tell they do pass packets of this
> size. *However*, on kernels that display this problem tracepath and 'ip
> route show table cache' both report that the GRE tunnel has a path MTU
> of 854 going from 128.100.3.52 to 128.100.3.51; however, 128.100.3.51
> sees a pmtu of 1200 for the path to 128.100.3.52.
> 
>  The machine experiencing these problems is a 64-bit x86_64 Fedora 15
> machine with various kernels. The problem does not happen with the
> current Fedora 14 kernel (nominally 2.6.35.14); it does happen with the
> Fedora 15 kernel ('2.6.40.6' aka some version of 3.0.0), the Fedora 16
> kernel (some version of 3.1.0) and on the current kernel.org git head
> as of last night. I am not running NetworkManager; all networking is
> statically configured and not changing during operation, and my IPSec
> setup is statically keyed[*].
> 
>  I would be happy to run any debugging tests or give any further
> information that people want. Should I try a different kernel git
> repo than Linus's kernel.org one?
> 
>  Thanks in advance.
> 
> (While I'm reading the mailing list I'm not directly subscribed to it,
> so copying me on replies will make sure that I see them immediately.)
> 

Do you have any errors on :

ip -s -d link show dev greXXXX

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-07 16:55 ` Eric Dumazet
@ 2011-11-08  6:17   ` Chris Siebenmann
  2011-11-08  6:43     ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08  6:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Siebenmann, netdev

| Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
| > I have a weird problem where a GRE tunnel periodically won't transmit
| > some (TCP) packets, while at the same time it will transmit others just
| > fine. This is happening in the current kernel.org git head kernel as
| > well as earlier ones.
[...]
| Do you have any errors on :
| 
| ip -s -d link show dev greXXXX

 I do indeed. When the problem is happening, I see TX errors counting
up one-for-one with packets that are not transmitted (and no RX
errors). Otherwise I don't see any errors. The other end of the GRE
tunnel shows no errors (TX or RX).

 Further information: when the problem is not happening, SSH doesn't
seem to transmit 500-data-octet packets during startup. Instead I see:

	IP 128.100.3.52.42538 > 128.100.3.51.ssh: Flags [.], seq 22:824, ack 22, win 91, options [nop,nop,TS val 1393299 ecr 29703771], length 802

 I have also once seen an 'ip route show table cache' entry for a route
through the GRE tunnel with 552-byte MTU listed:

	24.173.24.46 from 128.100.3.52 dev extun 
	    cache  expires 21333540sec ipid 0x9e5c mtu 552

I haven't been able to reproduce this. I have seen listed mtu figures
in 'ip route show table cache' output routinely drop to 774, though.

(I would like to have more data on this but inconveniently the problem
is now not reproducing itself. When it comes back I'll capture more
information about route cache mtu values and error counts and see if
there's anything interesting.)

	- cks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08  6:17   ` Chris Siebenmann
@ 2011-11-08  6:43     ` Eric Dumazet
  2011-11-08  7:08       ` Chris Siebenmann
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08  6:43 UTC (permalink / raw)
  To: Chris Siebenmann; +Cc: netdev

Le mardi 08 novembre 2011 à 01:17 -0500, Chris Siebenmann a écrit :
> | Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
> | > I have a weird problem where a GRE tunnel periodically won't transmit
> | > some (TCP) packets, while at the same time it will transmit others just
> | > fine. This is happening in the current kernel.org git head kernel as
> | > well as earlier ones.
> [...]
> | Do you have any errors on :
> | 
> | ip -s -d link show dev greXXXX
> 
>  I do indeed. When the problem is happening, I see TX errors counting
> up one-for-one with packets that are not transmitted (and no RX
> errors). Otherwise I don't see any errors. The other end of the GRE
> tunnel shows no errors (TX or RX).
> 
>  Further information: when the problem is not happening, SSH doesn't
> seem to transmit 500-data-octet packets during startup. Instead I see:
> 
> 	IP 128.100.3.52.42538 > 128.100.3.51.ssh: Flags [.], seq 22:824, ack 22, win 91, options [nop,nop,TS val 1393299 ecr 29703771], length 802
> 
>  I have also once seen an 'ip route show table cache' entry for a route
> through the GRE tunnel with 552-byte MTU listed:
> 
> 	24.173.24.46 from 128.100.3.52 dev extun 
> 	    cache  expires 21333540sec ipid 0x9e5c mtu 552
> 
> I haven't been able to reproduce this. I have seen listed mtu figures
> in 'ip route show table cache' output routinely drop to 774, though.
> 
> (I would like to have more data on this but inconveniently the problem
> is now not reproducing itself. When it comes back I'll capture more
> information about route cache mtu values and error counts and see if
> there's anything interesting.)
> 

OK, but could you please report the exact "ip -s -d link gre..."
output ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08  6:43     ` Eric Dumazet
@ 2011-11-08  7:08       ` Chris Siebenmann
  2011-11-08  7:34         ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08  7:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Siebenmann, netdev

| Le mardi 08 novembre 2011 à 01:17 -0500, Chris Siebenmann a écrit :
| > | Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
| > | > I have a weird problem where a GRE tunnel periodically won't transmit
| > | > some (TCP) packets, while at the same time it will transmit others just
| > | > fine. This is happening in the current kernel.org git head kernel as
| > | > well as earlier ones.
| > [...]
| > | Do you have any errors on :
| > | 
| > | ip -s -d link show dev greXXXX
| > 
| >  I do indeed. When the problem is happening, I see TX errors counting
| > up one-for-one with packets that are not transmitted (and no RX
| > errors). Otherwise I don't see any errors. The other end of the GRE
| > tunnel shows no errors (TX or RX).
[..]
| OK, but could you please report the exact "ip -s -d link gre..."
| output ?

Sure. Here it is:

7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
    link/gre 66.96.18.208 peer 128.100.3.58
    gre remote 128.100.3.58 local 66.96.18.208 dev ppp0 ttl inherit 
    RX: bytes  packets  errors  dropped overrun mcast   
    2793721    17015    0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    2242148    26824    18      0       0       0      

(The packet and byte counts were much lower when the problem happened;
this time around it happened relatively soon after I rebooted my machine
and brought up the PPPoE and GRE links, but hasn't happened since and
I've been using the GRE link.)

For vaguely historical reasons I don't use 'greXXX' as the name of
the GRE tunnel. I have a 'gre0' device, but it is not up. In case it
matters, its output is:

6: gre0: <NOARP> mtu 1476 qdisc noop state DOWN 
    link/gre 0.0.0.0 brd 0.0.0.0
    gre remote any local any ttl inherit nopmtudisc 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0      

 Let me know if you want a full dump of 'ip link show' (with or
without verbosity).

	- cks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08  7:08       ` Chris Siebenmann
@ 2011-11-08  7:34         ` Eric Dumazet
  2011-11-08 10:25           ` Eric Dumazet
  2011-11-08 13:05           ` Chris Siebenmann
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08  7:34 UTC (permalink / raw)
  To: Chris Siebenmann; +Cc: netdev

Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :

>  Let me know if you want a full dump of 'ip link show' (with or
> without verbosity).
> 
> 	- cks

Oh yes, I meant "ip -s -s link show dev extun"

to get detailed infos :

# ip -s -s link show dev ppp0
8: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UNKNOWN qlen 3
    link/ppp 
    RX: bytes  packets  errors  dropped overrun mcast   
    21120910   21103    0       0       0       0      
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    1310652    13736    0       0       0       0      
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0      

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08  7:34         ` Eric Dumazet
@ 2011-11-08 10:25           ` Eric Dumazet
  2011-11-08 13:05           ` Chris Siebenmann
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 10:25 UTC (permalink / raw)
  To: Chris Siebenmann; +Cc: netdev

Le mardi 08 novembre 2011 à 08:34 +0100, Eric Dumazet a écrit :
> Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
> 
> >  Let me know if you want a full dump of 'ip link show' (with or
> > without verbosity).
> > 
> > 	- cks
> 
> Oh yes, I meant "ip -s -s link show dev extun"
> 
> to get detailed infos :
> 
> # ip -s -s link show dev ppp0
> 8: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 3
>     link/ppp 
>     RX: bytes  packets  errors  dropped overrun mcast   
>     21120910   21103    0       0       0       0      
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     1310652    13736    0       0       0       0      
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0      
> 


Hmm, by the way, I see the default device queue length of ppp is 3

You might have packet drops at the ppp0 device itself

ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0

You could try to increase ppp0 queue length or install a classifier with
128 packet limit.

ifconfig ppp0 txqueuelen 32

or :

tc qdisc add dev ppp0 root sfq

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08  7:34         ` Eric Dumazet
  2011-11-08 10:25           ` Eric Dumazet
@ 2011-11-08 13:05           ` Chris Siebenmann
  2011-11-08 14:05             ` Eric Dumazet
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08 13:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Siebenmann, netdev

| Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
| >  Let me know if you want a full dump of 'ip link show' (with or
| > without verbosity).
| > 
| > 	- cks
| 
| Oh yes, I meant "ip -s -s link show dev extun"

7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
    link/gre 66.96.18.208 peer 128.100.3.58
    RX: bytes  packets  errors  dropped overrun mcast   
    3465662    19823    0       0       0       0      
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    2590152    30918    18      0       0       0      
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0      

Then:
| You might have packet drops at the ppp0 device itself
| 
| ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0

; ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
5: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
    link/ppp 
    RX: bytes  packets  errors  dropped overrun mcast   
    1065281803 2474545  0       0       0       0      
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    1046103545 2887730  0       0       0       0      
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0      
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1046103557 bytes 2887727 pkt (dropped 20, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

 I have now seen 552-byte mtus listed in 'ip route show table cache'
output. I can include the full output if you think it's of interest.
One of the remote IPs is a host we run, so I was able to test ssh to it
and it did not stall and did not seem to use 'length 500' packets in the
SSH connection, so this may be a red herring.

Okay, I just rebooted the machine (into the Fedora 15 kernel) and had
the problem reproduce. This time ppp0 shows no dropped packets while
extun counts up errors and I have a 552 mtu route (actually several of
them) in 'ip route show table cache':

6: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
    link/gre 66.96.18.208 peer 128.100.3.58
    RX: bytes  packets  errors  dropped overrun mcast   
    119000     653      0       0       0       0      
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    85848      989      17      0       0       0      
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0      

4: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
    link/ppp 
    RX: bytes  packets  errors  dropped overrun mcast   
    2152137    3743     0       0       0       0      
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    360478     4022     0       0       0       0      
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0      
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 360438 bytes 4018 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

Routes with low mtus for either the GRE gateway target or the host I am
trying to ssh to:

128.100.3.51 dev extun  src 128.100.3.52 
    cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
--
128.100.3.58 from 128.100.3.52 dev extun 
    cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.58 from 66.96.18.208 dev ppp0 
    cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.58 from 66.96.18.208 dev ppp0 
    cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.51 from 128.100.3.52 dev extun 
    cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.58 dev extun  src 128.100.3.52 
    cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.51 from 128.100.3.52 tos throughput dev extun 
    cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.51 from 128.100.3.52 dev extun 
    cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.51 from 128.100.3.52 tos lowdelay dev extun 
    cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10

 With the problem not happening any more, 'ip route' shows only a few
mtu 552 routes, none to the machines I am doing test ssh's to. The final
error count was 46 errors on extun (with 0 for all of the specific
causes) and no errors or dropped packets on ppp0.

	- cks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08 13:05           ` Chris Siebenmann
@ 2011-11-08 14:05             ` Eric Dumazet
  2011-11-10  5:16               ` Chris Siebenmann
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 14:05 UTC (permalink / raw)
  To: Chris Siebenmann; +Cc: netdev

Le mardi 08 novembre 2011 à 08:05 -0500, Chris Siebenmann a écrit :
> | Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
> | >  Let me know if you want a full dump of 'ip link show' (with or
> | > without verbosity).
> | > 
> | > 	- cks
> | 
> | Oh yes, I meant "ip -s -s link show dev extun"
> 
> 7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
>     link/gre 66.96.18.208 peer 128.100.3.58
>     RX: bytes  packets  errors  dropped overrun mcast   
>     3465662    19823    0       0       0       0      
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     2590152    30918    18      0       0       0      
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0      
> 
> Then:
> | You might have packet drops at the ppp0 device itself
> | 
> | ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
> 
> ; ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
> 5: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
>     link/ppp 
>     RX: bytes  packets  errors  dropped overrun mcast   
>     1065281803 2474545  0       0       0       0      
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     1046103545 2887730  0       0       0       0      
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0      
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 1046103557 bytes 2887727 pkt (dropped 20, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> 
>  I have now seen 552-byte mtus listed in 'ip route show table cache'
> output. I can include the full output if you think it's of interest.
> One of the remote IPs is a host we run, so I was able to test ssh to it
> and it did not stall and did not seem to use 'length 500' packets in the
> SSH connection, so this may be a red herring.
> 
> Okay, I just rebooted the machine (into the Fedora 15 kernel) and had
> the problem reproduce. This time ppp0 shows no dropped packets while
> extun counts up errors and I have a 552 mtu route (actually several of
> them) in 'ip route show table cache':
> 
> 6: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
>     link/gre 66.96.18.208 peer 128.100.3.58
>     RX: bytes  packets  errors  dropped overrun mcast   
>     119000     653      0       0       0       0      
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     85848      989      17      0       0       0      
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0      
> 
> 4: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
>     link/ppp 
>     RX: bytes  packets  errors  dropped overrun mcast   
>     2152137    3743     0       0       0       0      
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     360478     4022     0       0       0       0      
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0      
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 360438 bytes 4018 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> 
> Routes with low mtus for either the GRE gateway target or the host I am
> trying to ssh to:
> 
> 128.100.3.51 dev extun  src 128.100.3.52 
>     cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> --
> 128.100.3.58 from 128.100.3.52 dev extun 
>     cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.58 from 66.96.18.208 dev ppp0 
>     cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.58 from 66.96.18.208 dev ppp0 
>     cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.51 from 128.100.3.52 dev extun 
>     cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.58 dev extun  src 128.100.3.52 
>     cache  expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.51 from 128.100.3.52 tos throughput dev extun 
>     cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.51 from 128.100.3.52 dev extun 
>     cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.51 from 128.100.3.52 tos lowdelay dev extun 
>     cache  expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 
>  With the problem not happening any more, 'ip route' shows only a few
> mtu 552 routes, none to the machines I am doing test ssh's to. The final
> error count was 46 errors on extun (with 0 for all of the specific
> causes) and no errors or dropped packets on ppp0.
> 

So it appears the drop is in gre xmit because frame is bigger than
mtu...

Maybe you receive some strange ICMP (ICMP_FRAG_NEEDED) from a buggy
host ?

You could catch it with "tcpdump -s 1000 -i any icmp" maybe...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bug? GRE tunnel periodically won't transmit some packets
  2011-11-08 14:05             ` Eric Dumazet
@ 2011-11-10  5:16               ` Chris Siebenmann
  2011-11-21  0:23                 ` Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets) Chris Siebenmann
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-10  5:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Siebenmann, netdev

| So it appears the drop is in gre xmit because frame is bigger than
| mtu...
| 
| Maybe you receive some strange ICMP (ICMP_FRAG_NEEDED) from a buggy
| host ?
| 
| You could catch it with "tcpdump -s 1000 -i any icmp" maybe...

 The problem went away for several days and then came roaring back
just now. I couldn't see any outside ICMP messages like this while
the problem was happening. I did see ICMP messages, but they were
locally generated:

	IP 128.100.3.52 > 128.100.3.52: ICMP 128.100.3.51 unreachable - need to frag (mtu 478), length 556

(I verified that these were for the things that were stalling with
'tcpdump -vv -ee'.)

At the time that this was happening, I could see a lot of 'ip route show
table cache' entries like this:

	128.100.3.58 from 66.96.18.208 dev ppp0  src 66.96.18.208
	    cache  expires 286sec ipid 0xdeda mtu 552 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9 iif lo

There were a bunch of other 'mtu 552' routes. Flushing the routing cache
(with 'ip route flush cache; ip route show table cache' to verify that
it had flushed) did not change the situation; the problem continued and
the 'mtu 552' routes came back as fast as I could check (it appeared to
be the moment that the routing cache was repopulated, there they were).

 In general the route cache appears to have strangely low MTUs listed
for the remote end of the tunnel (at least after the problem's
happened), eg:

	128.100.3.58 from 66.96.18.208 dev ppp0 
	    cache  expires 203sec ipid 0xdeda mtu 934 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9
	128.100.3.58 from 66.96.18.208 dev ppp0  src 66.96.18.208 
	    cache  expires 203sec ipid 0xdeda mtu 934 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9 iif lo

I would normally expect this to be 1492, the PPP link MTU.

 This does not seem to happen on the Fedora 14 kernel (the one where
the problem doesn't happen). There the route is listed as:

	128.100.3.58 from 66.96.18.208 dev ppp0 
	    cache  mtu 1492 advmss 1452 hoplimit 64
	128.100.3.58 from 66.96.18.208 dev ppp0 
	    cache  mtu 1492 advmss 1452 hoplimit 64

(to quote what I *think* are the relevant entries.)

 Since things may be pointing towards routing cache oddities, I should
mention that I have a somewhat peculiar policy based routing setup
involving this GRE tunnel. While it's been working fine for literally
years, it's possible that recent changes in the area changed something
so that peculiar things now happen; if you think it might be relevant, I
can provide a dump of the routing rules and explain the setup.

(Part of why this occurs to me now is that I know it's possible to have
a connection to 128.100.3.58 (the remote end of the tunnel) that runs
through the tunnel itself, and I've seen such routes appear in the 'ip
route show table cache' output.)

		- cks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets)
  2011-11-10  5:16               ` Chris Siebenmann
@ 2011-11-21  0:23                 ` Chris Siebenmann
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-21  0:23 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: cks

 I believe I've identified the root cause of my GRE tunnel packet
transmission problems. The short summary is that I have a 'recursive'
routing, where the route for the tunnel endpoint can nominally be routed
over the tunnel itself. In current kernel versions, when a packet for
the tunnel endpoint is actually routed over the tunnel the path MTUs
determined for the endpoint and the tunnel both collapse down to very
small values.

 Here is the routing I have. First, the links themselves, for the DSL
PPPoE device and the GRE tunnel:

3: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
    link/ppp 
    inet 66.96.18.208 peer 66.96.31.6/32 scope global ppp0
5: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN 
    link/gre 66.96.18.208 peer 128.100.3.58
    inet 128.100.3.52/32 scope global extun

(My IPSec policy forces GRE traffic between 128.100.3.58 and
66.96.18.208 to be encrypted in 'esp/tunnel' mode.)

 Now the recursion. To reach other machines on 128.100.3.0/24, I route
128.100.3.0/24 over the GRE tunnel:
	; ip route list match 128.100.3.51
	default dev ppp0  scope link
	128.100.3.0/24 dev extun  scope link

 I also have policy based routing set to force traffic with an IP
origin of 66.96.18.208 out over the PPP link and traffic with an
IP origin of 128.100.3.52 out the GRE tunnel.

 With this setup in place, if I do anything that tries to talk to
128.100.3.58 (such as ping or ssh) what I get is an immediate path
MTU collapse for the 66.96.18.208 -> 128.100.3.58 link used by the
GRE tunnel, ending when the path MTU for 66.96.18.208 -> 128.100.3.58
reaches 552 octets. At this point various things choke (I am guessing
because the GRE tunnel expects a minimum MTU of 576 octets).

 If I add a host route for 128.100.3.58 that forces traffic for it
through ppp0 I can mostly avoid this route collapse:
	; ip route list exact 128.100.3.58
	128.100.3.58 dev ppp0  scope link  src 66.96.18.208  mtu 1492

 However, even with this if I explicitly force traffic for 128.100.3.58
over the GRE tunnel (such as by specifying the IP source address
so as to make my policy based routing kick in) I still see the MTU
collapse. Using 'mtu lock 1492' instead of plain 'mtu 1492' on this
host-based route does not appear to change anything.

 This did not happen back in kernel 2.6.35.14 (the Fedora 14 kernel) and
previous kernels (going back years). In that kernel everything was happy
even without the ppp0-forcing host route for 128.100.3.58 and I could
talk to 128.100.3.58 over the GRE tunnel without causing any path MTU
changes (and without problems in general).

(This always made my head hurt a little bit but since it worked,
I didn't worry about it.)

	- cks

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-11-21  0:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-07 16:21 Bug? GRE tunnel periodically won't transmit some packets Chris Siebenmann
2011-11-07 16:55 ` Eric Dumazet
2011-11-08  6:17   ` Chris Siebenmann
2011-11-08  6:43     ` Eric Dumazet
2011-11-08  7:08       ` Chris Siebenmann
2011-11-08  7:34         ` Eric Dumazet
2011-11-08 10:25           ` Eric Dumazet
2011-11-08 13:05           ` Chris Siebenmann
2011-11-08 14:05             ` Eric Dumazet
2011-11-10  5:16               ` Chris Siebenmann
2011-11-21  0:23                 ` Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets) Chris Siebenmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).