* Bug? GRE tunnel periodically won't transmit some packets
@ 2011-11-07 16:21 Chris Siebenmann
2011-11-07 16:55 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-07 16:21 UTC (permalink / raw)
To: netdev; +Cc: cks
I have a weird problem where a GRE tunnel periodically won't transmit
some (TCP) packets, while at the same time it will transmit others just
fine. This is happening in the current kernel.org git head kernel as
well as earlier ones.
The networking environment is a GRE tunnel over IPSec in tunnel mode
('esp/tunnel/...') over a DSL PPPoE link. What I observe is that
periodically outbound SSH connections stall early in the protocol
negociation, and other TCP connections can similarly stall. Sometimes
they recover and sometimes they time out. The problem is pretty
reproducable and regular, although not constant (sometimes the affected
packets get through right away).
I have tcpdump'd both the GRE tunnel device and the underlying DSL
PPPoE device and during a stall, the GRE tcpdump will show packets being
sent that do not appear on the DSL PPPoE link. All of the packets that
I've seen stalling have had 500 data octets.
Typical packets are:
IP 128.100.3.52.52063 > 128.100.3.51.ssh: Flags [.], seq 22:522, ack 22, win 91, options [nop,nop,TS val 143020 ecr 966040433], length 500
(here 128.100.3.52 is the GRE tunnel IP address of the machine
experiencing problems)
or ttcp:
IP 128.100.3.52.46585 > 128.100.3.51.5001: Flags [.], seq 1:501, ack 1, win 91, options [nop,nop,TS val 729200 ecr 979199256], length 500
Ttcp had a whole run of 'length 500' packets fail to go through. SSH
will actually successfully transmit later (different-length) packets,
eg:
128.100.3.52.52063 > 128.100.3.51.ssh: Flags [P.], seq 522:926, ack 22, win 91, options [nop,nop,TS val 143037 ecr 966040450], length 404
The DSL PPPoE link has an MTU of 1492 and the GRE tunnel has an MTU of
1200 (on both ends). As far as I can tell they do pass packets of this
size. *However*, on kernels that display this problem tracepath and 'ip
route show table cache' both report that the GRE tunnel has a path MTU
of 854 going from 128.100.3.52 to 128.100.3.51; however, 128.100.3.51
sees a pmtu of 1200 for the path to 128.100.3.52.
The machine experiencing these problems is a 64-bit x86_64 Fedora 15
machine with various kernels. The problem does not happen with the
current Fedora 14 kernel (nominally 2.6.35.14); it does happen with the
Fedora 15 kernel ('2.6.40.6' aka some version of 3.0.0), the Fedora 16
kernel (some version of 3.1.0) and on the current kernel.org git head
as of last night. I am not running NetworkManager; all networking is
statically configured and not changing during operation, and my IPSec
setup is statically keyed[*].
I would be happy to run any debugging tests or give any further
information that people want. Should I try a different kernel git
repo than Linus's kernel.org one?
Thanks in advance.
(While I'm reading the mailing list I'm not directly subscribed to it,
so copying me on replies will make sure that I see them immediately.)
- cks
[*: I'm aware that this is not ideal from a security perspective since
it relies on me manually rekeying everything every so often.]
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-07 16:21 Bug? GRE tunnel periodically won't transmit some packets Chris Siebenmann
@ 2011-11-07 16:55 ` Eric Dumazet
2011-11-08 6:17 ` Chris Siebenmann
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-07 16:55 UTC (permalink / raw)
To: Chris Siebenmann; +Cc: netdev
Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
> I have a weird problem where a GRE tunnel periodically won't transmit
> some (TCP) packets, while at the same time it will transmit others just
> fine. This is happening in the current kernel.org git head kernel as
> well as earlier ones.
>
> The networking environment is a GRE tunnel over IPSec in tunnel mode
> ('esp/tunnel/...') over a DSL PPPoE link. What I observe is that
> periodically outbound SSH connections stall early in the protocol
> negociation, and other TCP connections can similarly stall. Sometimes
> they recover and sometimes they time out. The problem is pretty
> reproducable and regular, although not constant (sometimes the affected
> packets get through right away).
>
> I have tcpdump'd both the GRE tunnel device and the underlying DSL
> PPPoE device and during a stall, the GRE tcpdump will show packets being
> sent that do not appear on the DSL PPPoE link. All of the packets that
> I've seen stalling have had 500 data octets.
>
> Typical packets are:
> IP 128.100.3.52.52063 > 128.100.3.51.ssh: Flags [.], seq 22:522, ack 22, win 91, options [nop,nop,TS val 143020 ecr 966040433], length 500
>
> (here 128.100.3.52 is the GRE tunnel IP address of the machine
> experiencing problems)
>
> or ttcp:
> IP 128.100.3.52.46585 > 128.100.3.51.5001: Flags [.], seq 1:501, ack 1, win 91, options [nop,nop,TS val 729200 ecr 979199256], length 500
>
> Ttcp had a whole run of 'length 500' packets fail to go through. SSH
> will actually successfully transmit later (different-length) packets,
> eg:
> 128.100.3.52.52063 > 128.100.3.51.ssh: Flags [P.], seq 522:926, ack 22, win 91, options [nop,nop,TS val 143037 ecr 966040450], length 404
>
> The DSL PPPoE link has an MTU of 1492 and the GRE tunnel has an MTU of
> 1200 (on both ends). As far as I can tell they do pass packets of this
> size. *However*, on kernels that display this problem tracepath and 'ip
> route show table cache' both report that the GRE tunnel has a path MTU
> of 854 going from 128.100.3.52 to 128.100.3.51; however, 128.100.3.51
> sees a pmtu of 1200 for the path to 128.100.3.52.
>
> The machine experiencing these problems is a 64-bit x86_64 Fedora 15
> machine with various kernels. The problem does not happen with the
> current Fedora 14 kernel (nominally 2.6.35.14); it does happen with the
> Fedora 15 kernel ('2.6.40.6' aka some version of 3.0.0), the Fedora 16
> kernel (some version of 3.1.0) and on the current kernel.org git head
> as of last night. I am not running NetworkManager; all networking is
> statically configured and not changing during operation, and my IPSec
> setup is statically keyed[*].
>
> I would be happy to run any debugging tests or give any further
> information that people want. Should I try a different kernel git
> repo than Linus's kernel.org one?
>
> Thanks in advance.
>
> (While I'm reading the mailing list I'm not directly subscribed to it,
> so copying me on replies will make sure that I see them immediately.)
>
Do you have any errors on :
ip -s -d link show dev greXXXX
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-07 16:55 ` Eric Dumazet
@ 2011-11-08 6:17 ` Chris Siebenmann
2011-11-08 6:43 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08 6:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Chris Siebenmann, netdev
| Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
| > I have a weird problem where a GRE tunnel periodically won't transmit
| > some (TCP) packets, while at the same time it will transmit others just
| > fine. This is happening in the current kernel.org git head kernel as
| > well as earlier ones.
[...]
| Do you have any errors on :
|
| ip -s -d link show dev greXXXX
I do indeed. When the problem is happening, I see TX errors counting
up one-for-one with packets that are not transmitted (and no RX
errors). Otherwise I don't see any errors. The other end of the GRE
tunnel shows no errors (TX or RX).
Further information: when the problem is not happening, SSH doesn't
seem to transmit 500-data-octet packets during startup. Instead I see:
IP 128.100.3.52.42538 > 128.100.3.51.ssh: Flags [.], seq 22:824, ack 22, win 91, options [nop,nop,TS val 1393299 ecr 29703771], length 802
I have also once seen an 'ip route show table cache' entry for a route
through the GRE tunnel with 552-byte MTU listed:
24.173.24.46 from 128.100.3.52 dev extun
cache expires 21333540sec ipid 0x9e5c mtu 552
I haven't been able to reproduce this. I have seen listed mtu figures
in 'ip route show table cache' output routinely drop to 774, though.
(I would like to have more data on this but inconveniently the problem
is now not reproducing itself. When it comes back I'll capture more
information about route cache mtu values and error counts and see if
there's anything interesting.)
- cks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 6:17 ` Chris Siebenmann
@ 2011-11-08 6:43 ` Eric Dumazet
2011-11-08 7:08 ` Chris Siebenmann
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 6:43 UTC (permalink / raw)
To: Chris Siebenmann; +Cc: netdev
Le mardi 08 novembre 2011 à 01:17 -0500, Chris Siebenmann a écrit :
> | Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
> | > I have a weird problem where a GRE tunnel periodically won't transmit
> | > some (TCP) packets, while at the same time it will transmit others just
> | > fine. This is happening in the current kernel.org git head kernel as
> | > well as earlier ones.
> [...]
> | Do you have any errors on :
> |
> | ip -s -d link show dev greXXXX
>
> I do indeed. When the problem is happening, I see TX errors counting
> up one-for-one with packets that are not transmitted (and no RX
> errors). Otherwise I don't see any errors. The other end of the GRE
> tunnel shows no errors (TX or RX).
>
> Further information: when the problem is not happening, SSH doesn't
> seem to transmit 500-data-octet packets during startup. Instead I see:
>
> IP 128.100.3.52.42538 > 128.100.3.51.ssh: Flags [.], seq 22:824, ack 22, win 91, options [nop,nop,TS val 1393299 ecr 29703771], length 802
>
> I have also once seen an 'ip route show table cache' entry for a route
> through the GRE tunnel with 552-byte MTU listed:
>
> 24.173.24.46 from 128.100.3.52 dev extun
> cache expires 21333540sec ipid 0x9e5c mtu 552
>
> I haven't been able to reproduce this. I have seen listed mtu figures
> in 'ip route show table cache' output routinely drop to 774, though.
>
> (I would like to have more data on this but inconveniently the problem
> is now not reproducing itself. When it comes back I'll capture more
> information about route cache mtu values and error counts and see if
> there's anything interesting.)
>
OK, but could you please report the exact "ip -s -d link gre..."
output ?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 6:43 ` Eric Dumazet
@ 2011-11-08 7:08 ` Chris Siebenmann
2011-11-08 7:34 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08 7:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Chris Siebenmann, netdev
| Le mardi 08 novembre 2011 à 01:17 -0500, Chris Siebenmann a écrit :
| > | Le lundi 07 novembre 2011 à 11:21 -0500, Chris Siebenmann a écrit :
| > | > I have a weird problem where a GRE tunnel periodically won't transmit
| > | > some (TCP) packets, while at the same time it will transmit others just
| > | > fine. This is happening in the current kernel.org git head kernel as
| > | > well as earlier ones.
| > [...]
| > | Do you have any errors on :
| > |
| > | ip -s -d link show dev greXXXX
| >
| > I do indeed. When the problem is happening, I see TX errors counting
| > up one-for-one with packets that are not transmitted (and no RX
| > errors). Otherwise I don't see any errors. The other end of the GRE
| > tunnel shows no errors (TX or RX).
[..]
| OK, but could you please report the exact "ip -s -d link gre..."
| output ?
Sure. Here it is:
7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
link/gre 66.96.18.208 peer 128.100.3.58
gre remote 128.100.3.58 local 66.96.18.208 dev ppp0 ttl inherit
RX: bytes packets errors dropped overrun mcast
2793721 17015 0 0 0 0
TX: bytes packets errors dropped carrier collsns
2242148 26824 18 0 0 0
(The packet and byte counts were much lower when the problem happened;
this time around it happened relatively soon after I rebooted my machine
and brought up the PPPoE and GRE links, but hasn't happened since and
I've been using the GRE link.)
For vaguely historical reasons I don't use 'greXXX' as the name of
the GRE tunnel. I have a 'gre0' device, but it is not up. In case it
matters, its output is:
6: gre0: <NOARP> mtu 1476 qdisc noop state DOWN
link/gre 0.0.0.0 brd 0.0.0.0
gre remote any local any ttl inherit nopmtudisc
RX: bytes packets errors dropped overrun mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
Let me know if you want a full dump of 'ip link show' (with or
without verbosity).
- cks
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 7:08 ` Chris Siebenmann
@ 2011-11-08 7:34 ` Eric Dumazet
2011-11-08 10:25 ` Eric Dumazet
2011-11-08 13:05 ` Chris Siebenmann
0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 7:34 UTC (permalink / raw)
To: Chris Siebenmann; +Cc: netdev
Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
> Let me know if you want a full dump of 'ip link show' (with or
> without verbosity).
>
> - cks
Oh yes, I meant "ip -s -s link show dev extun"
to get detailed infos :
# ip -s -s link show dev ppp0
8: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UNKNOWN qlen 3
link/ppp
RX: bytes packets errors dropped overrun mcast
21120910 21103 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1310652 13736 0 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 7:34 ` Eric Dumazet
@ 2011-11-08 10:25 ` Eric Dumazet
2011-11-08 13:05 ` Chris Siebenmann
1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 10:25 UTC (permalink / raw)
To: Chris Siebenmann; +Cc: netdev
Le mardi 08 novembre 2011 à 08:34 +0100, Eric Dumazet a écrit :
> Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
>
> > Let me know if you want a full dump of 'ip link show' (with or
> > without verbosity).
> >
> > - cks
>
> Oh yes, I meant "ip -s -s link show dev extun"
>
> to get detailed infos :
>
> # ip -s -s link show dev ppp0
> 8: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 3
> link/ppp
> RX: bytes packets errors dropped overrun mcast
> 21120910 21103 0 0 0 0
> RX errors: length crc frame fifo missed
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 1310652 13736 0 0 0 0
> TX errors: aborted fifo window heartbeat
> 0 0 0 0
>
Hmm, by the way, I see the default device queue length of ppp is 3
You might have packet drops at the ppp0 device itself
ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
You could try to increase ppp0 queue length or install a classifier with
128 packet limit.
ifconfig ppp0 txqueuelen 32
or :
tc qdisc add dev ppp0 root sfq
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 7:34 ` Eric Dumazet
2011-11-08 10:25 ` Eric Dumazet
@ 2011-11-08 13:05 ` Chris Siebenmann
2011-11-08 14:05 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-08 13:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Chris Siebenmann, netdev
| Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
| > Let me know if you want a full dump of 'ip link show' (with or
| > without verbosity).
| >
| > - cks
|
| Oh yes, I meant "ip -s -s link show dev extun"
7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
link/gre 66.96.18.208 peer 128.100.3.58
RX: bytes packets errors dropped overrun mcast
3465662 19823 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
2590152 30918 18 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
Then:
| You might have packet drops at the ppp0 device itself
|
| ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
; ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
5: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
link/ppp
RX: bytes packets errors dropped overrun mcast
1065281803 2474545 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1046103545 2887730 0 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1046103557 bytes 2887727 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
I have now seen 552-byte mtus listed in 'ip route show table cache'
output. I can include the full output if you think it's of interest.
One of the remote IPs is a host we run, so I was able to test ssh to it
and it did not stall and did not seem to use 'length 500' packets in the
SSH connection, so this may be a red herring.
Okay, I just rebooted the machine (into the Fedora 15 kernel) and had
the problem reproduce. This time ppp0 shows no dropped packets while
extun counts up errors and I have a 552 mtu route (actually several of
them) in 'ip route show table cache':
6: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
link/gre 66.96.18.208 peer 128.100.3.58
RX: bytes packets errors dropped overrun mcast
119000 653 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
85848 989 17 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
4: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
link/ppp
RX: bytes packets errors dropped overrun mcast
2152137 3743 0 0 0 0
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
360478 4022 0 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 360438 bytes 4018 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Routes with low mtus for either the GRE gateway target or the host I am
trying to ssh to:
128.100.3.51 dev extun src 128.100.3.52
cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
--
128.100.3.58 from 128.100.3.52 dev extun
cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.58 from 66.96.18.208 dev ppp0
cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.58 from 66.96.18.208 dev ppp0
cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.51 from 128.100.3.52 dev extun
cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.58 dev extun src 128.100.3.52
cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
--
128.100.3.51 from 128.100.3.52 tos throughput dev extun
cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.51 from 128.100.3.52 dev extun
cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
128.100.3.51 from 128.100.3.52 tos lowdelay dev extun
cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
With the problem not happening any more, 'ip route' shows only a few
mtu 552 routes, none to the machines I am doing test ssh's to. The final
error count was 46 errors on extun (with 0 for all of the specific
causes) and no errors or dropped packets on ppp0.
- cks
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 13:05 ` Chris Siebenmann
@ 2011-11-08 14:05 ` Eric Dumazet
2011-11-10 5:16 ` Chris Siebenmann
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-11-08 14:05 UTC (permalink / raw)
To: Chris Siebenmann; +Cc: netdev
Le mardi 08 novembre 2011 à 08:05 -0500, Chris Siebenmann a écrit :
> | Le mardi 08 novembre 2011 à 02:08 -0500, Chris Siebenmann a écrit :
> | > Let me know if you want a full dump of 'ip link show' (with or
> | > without verbosity).
> | >
> | > - cks
> |
> | Oh yes, I meant "ip -s -s link show dev extun"
>
> 7: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
> link/gre 66.96.18.208 peer 128.100.3.58
> RX: bytes packets errors dropped overrun mcast
> 3465662 19823 0 0 0 0
> RX errors: length crc frame fifo missed
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 2590152 30918 18 0 0 0
> TX errors: aborted fifo window heartbeat
> 0 0 0 0
>
> Then:
> | You might have packet drops at the ppp0 device itself
> |
> | ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
>
> ; ip -s -s link show dev ppp0 ; tc -s -d qdisc show dev ppp0
> 5: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
> link/ppp
> RX: bytes packets errors dropped overrun mcast
> 1065281803 2474545 0 0 0 0
> RX errors: length crc frame fifo missed
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 1046103545 2887730 0 0 0 0
> TX errors: aborted fifo window heartbeat
> 0 0 0 0
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> Sent 1046103557 bytes 2887727 pkt (dropped 20, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> I have now seen 552-byte mtus listed in 'ip route show table cache'
> output. I can include the full output if you think it's of interest.
> One of the remote IPs is a host we run, so I was able to test ssh to it
> and it did not stall and did not seem to use 'length 500' packets in the
> SSH connection, so this may be a red herring.
>
> Okay, I just rebooted the machine (into the Fedora 15 kernel) and had
> the problem reproduce. This time ppp0 shows no dropped packets while
> extun counts up errors and I have a 552 mtu route (actually several of
> them) in 'ip route show table cache':
>
> 6: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
> link/gre 66.96.18.208 peer 128.100.3.58
> RX: bytes packets errors dropped overrun mcast
> 119000 653 0 0 0 0
> RX errors: length crc frame fifo missed
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 85848 989 17 0 0 0
> TX errors: aborted fifo window heartbeat
> 0 0 0 0
>
> 4: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
> link/ppp
> RX: bytes packets errors dropped overrun mcast
> 2152137 3743 0 0 0 0
> RX errors: length crc frame fifo missed
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 360478 4022 0 0 0 0
> TX errors: aborted fifo window heartbeat
> 0 0 0 0
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> Sent 360438 bytes 4018 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> Routes with low mtus for either the GRE gateway target or the host I am
> trying to ssh to:
>
> 128.100.3.51 dev extun src 128.100.3.52
> cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> --
> 128.100.3.58 from 128.100.3.52 dev extun
> cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.58 from 66.96.18.208 dev ppp0
> cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.58 from 66.96.18.208 dev ppp0
> cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.51 from 128.100.3.52 dev extun
> cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.58 dev extun src 128.100.3.52
> cache expires 397sec ipid 0xdeda mtu 614 rtt 15ms rttvar 15ms cwnd 10
> --
> 128.100.3.51 from 128.100.3.52 tos throughput dev extun
> cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.51 from 128.100.3.52 dev extun
> cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
> 128.100.3.51 from 128.100.3.52 tos lowdelay dev extun
> cache expires 401sec ipid 0x941a mtu 552 rtt 27ms rttvar 27ms cwnd 10
>
> With the problem not happening any more, 'ip route' shows only a few
> mtu 552 routes, none to the machines I am doing test ssh's to. The final
> error count was 46 errors on extun (with 0 for all of the specific
> causes) and no errors or dropped packets on ppp0.
>
So it appears the drop is in gre xmit because frame is bigger than
mtu...
Maybe you receive some strange ICMP (ICMP_FRAG_NEEDED) from a buggy
host ?
You could catch it with "tcpdump -s 1000 -i any icmp" maybe...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Bug? GRE tunnel periodically won't transmit some packets
2011-11-08 14:05 ` Eric Dumazet
@ 2011-11-10 5:16 ` Chris Siebenmann
2011-11-21 0:23 ` Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets) Chris Siebenmann
0 siblings, 1 reply; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-10 5:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Chris Siebenmann, netdev
| So it appears the drop is in gre xmit because frame is bigger than
| mtu...
|
| Maybe you receive some strange ICMP (ICMP_FRAG_NEEDED) from a buggy
| host ?
|
| You could catch it with "tcpdump -s 1000 -i any icmp" maybe...
The problem went away for several days and then came roaring back
just now. I couldn't see any outside ICMP messages like this while
the problem was happening. I did see ICMP messages, but they were
locally generated:
IP 128.100.3.52 > 128.100.3.52: ICMP 128.100.3.51 unreachable - need to frag (mtu 478), length 556
(I verified that these were for the things that were stalling with
'tcpdump -vv -ee'.)
At the time that this was happening, I could see a lot of 'ip route show
table cache' entries like this:
128.100.3.58 from 66.96.18.208 dev ppp0 src 66.96.18.208
cache expires 286sec ipid 0xdeda mtu 552 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9 iif lo
There were a bunch of other 'mtu 552' routes. Flushing the routing cache
(with 'ip route flush cache; ip route show table cache' to verify that
it had flushed) did not change the situation; the problem continued and
the 'mtu 552' routes came back as fast as I could check (it appeared to
be the moment that the routing cache was repopulated, there they were).
In general the route cache appears to have strangely low MTUs listed
for the remote end of the tunnel (at least after the problem's
happened), eg:
128.100.3.58 from 66.96.18.208 dev ppp0
cache expires 203sec ipid 0xdeda mtu 934 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9
128.100.3.58 from 66.96.18.208 dev ppp0 src 66.96.18.208
cache expires 203sec ipid 0xdeda mtu 934 rtt 44ms rttvar 30ms ssthresh 7 cwnd 9 iif lo
I would normally expect this to be 1492, the PPP link MTU.
This does not seem to happen on the Fedora 14 kernel (the one where
the problem doesn't happen). There the route is listed as:
128.100.3.58 from 66.96.18.208 dev ppp0
cache mtu 1492 advmss 1452 hoplimit 64
128.100.3.58 from 66.96.18.208 dev ppp0
cache mtu 1492 advmss 1452 hoplimit 64
(to quote what I *think* are the relevant entries.)
Since things may be pointing towards routing cache oddities, I should
mention that I have a somewhat peculiar policy based routing setup
involving this GRE tunnel. While it's been working fine for literally
years, it's possible that recent changes in the area changed something
so that peculiar things now happen; if you think it might be relevant, I
can provide a dump of the routing rules and explain the setup.
(Part of why this occurs to me now is that I know it's possible to have
a connection to 128.100.3.58 (the remote end of the tunnel) that runs
through the tunnel itself, and I've seen such routes appear in the 'ip
route show table cache' output.)
- cks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets)
2011-11-10 5:16 ` Chris Siebenmann
@ 2011-11-21 0:23 ` Chris Siebenmann
0 siblings, 0 replies; 11+ messages in thread
From: Chris Siebenmann @ 2011-11-21 0:23 UTC (permalink / raw)
To: Eric Dumazet, netdev; +Cc: cks
I believe I've identified the root cause of my GRE tunnel packet
transmission problems. The short summary is that I have a 'recursive'
routing, where the route for the tunnel endpoint can nominally be routed
over the tunnel itself. In current kernel versions, when a packet for
the tunnel endpoint is actually routed over the tunnel the path MTUs
determined for the endpoint and the tunnel both collapse down to very
small values.
Here is the routing I have. First, the links themselves, for the DSL
PPPoE device and the GRE tunnel:
3: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UNKNOWN qlen 3
link/ppp
inet 66.96.18.208 peer 66.96.31.6/32 scope global ppp0
5: extun: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN
link/gre 66.96.18.208 peer 128.100.3.58
inet 128.100.3.52/32 scope global extun
(My IPSec policy forces GRE traffic between 128.100.3.58 and
66.96.18.208 to be encrypted in 'esp/tunnel' mode.)
Now the recursion. To reach other machines on 128.100.3.0/24, I route
128.100.3.0/24 over the GRE tunnel:
; ip route list match 128.100.3.51
default dev ppp0 scope link
128.100.3.0/24 dev extun scope link
I also have policy based routing set to force traffic with an IP
origin of 66.96.18.208 out over the PPP link and traffic with an
IP origin of 128.100.3.52 out the GRE tunnel.
With this setup in place, if I do anything that tries to talk to
128.100.3.58 (such as ping or ssh) what I get is an immediate path
MTU collapse for the 66.96.18.208 -> 128.100.3.58 link used by the
GRE tunnel, ending when the path MTU for 66.96.18.208 -> 128.100.3.58
reaches 552 octets. At this point various things choke (I am guessing
because the GRE tunnel expects a minimum MTU of 576 octets).
If I add a host route for 128.100.3.58 that forces traffic for it
through ppp0 I can mostly avoid this route collapse:
; ip route list exact 128.100.3.58
128.100.3.58 dev ppp0 scope link src 66.96.18.208 mtu 1492
However, even with this if I explicitly force traffic for 128.100.3.58
over the GRE tunnel (such as by specifying the IP source address
so as to make my policy based routing kick in) I still see the MTU
collapse. Using 'mtu lock 1492' instead of plain 'mtu 1492' on this
host-based route does not appear to change anything.
This did not happen back in kernel 2.6.35.14 (the Fedora 14 kernel) and
previous kernels (going back years). In that kernel everything was happy
even without the ppp0-forcing host route for 128.100.3.58 and I could
talk to 128.100.3.58 over the GRE tunnel without causing any path MTU
changes (and without problems in general).
(This always made my head hurt a little bit but since it worked,
I didn't worry about it.)
- cks
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-11-21 0:23 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-07 16:21 Bug? GRE tunnel periodically won't transmit some packets Chris Siebenmann
2011-11-07 16:55 ` Eric Dumazet
2011-11-08 6:17 ` Chris Siebenmann
2011-11-08 6:43 ` Eric Dumazet
2011-11-08 7:08 ` Chris Siebenmann
2011-11-08 7:34 ` Eric Dumazet
2011-11-08 10:25 ` Eric Dumazet
2011-11-08 13:05 ` Chris Siebenmann
2011-11-08 14:05 ` Eric Dumazet
2011-11-10 5:16 ` Chris Siebenmann
2011-11-21 0:23 ` Recursive routing causes MTU collapse (was Re: Bug? GRE tunnel periodically won't transmit some packets) Chris Siebenmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).