netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+
@ 2014-11-29 11:44 Thomas Jarosch
  2014-12-01 10:25 ` Herbert Xu
  2014-12-01 13:17 ` [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+ Wolfgang Walter
  0 siblings, 2 replies; 40+ messages in thread
From: Thomas Jarosch @ 2014-11-29 11:44 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]

Hello,

we're in the process of updating production level machines
from kernel 3.4.101 to kernel 3.14.25. On one mail server
we noticed that emails destined for an IPSec tunnel sometimes
get stuck in the mail queue with TCP timeouts.

To make a long story short: When the VPN connection is initially
set up or re-newed, the path MTU for the xfrm tunnel is undetermined.

As soon as a TCP client starts to send large packets,
it triggers path MTU detection. Some middlebox on the
way to the final server has a lower MTU and sends back
an "ICMP fragmentation needed" packet as normal.

With the old kernel, the packet size for the TCP connection inside
the xfrm tunnel gets adjusted and all is fine. With kernel v3.12+,
the connection stalls completely. Same thing with kernel v3.18-rc6.

We wrote a small tool to mimic postfix's TCP behavior (see attached file).
In the end it's a normal TCP client sending large packets.
The server side is just "socat - tcp4-listen:667".

If you run "socket_client" a second time, the path MTU
for the xfrm tunnel is already known and packets flow normal, too.


The "evil" commit in question is this one:
---------------------------------------------------------------------
commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 15 12:24:54 2013 -0700

    tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()

    sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO
    packets in the first place. (As a performance hint)

    Once we have GSO packets in write queue, we can not decide they are no
    longer GSO only because flow now uses a route which doesn't handle
    TSO/GSO.

    Core networking stack handles the case very well for us, all we need
    is keeping track of packet counts in MSS terms, regardless of
    segmentation done later (in GSO or hardware)

    Right now, if  tcp_fragment() splits a GSO packet in two parts,
    @left and @right, and route changed through a non GSO device,
    both @left and @right have pcount set to 1, which is wrong,
    and leads to incorrect packet_count tracking.

    This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU
    discovery")

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Reported-by: Maciej Żenczykowski <maze@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8fad1c1..d46f214 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -989,8 +989,7 @@ static void tcp_set_skb_tso_segs(const struct sock *sk, struct sk_buff *skb,
        /* Make sure we own this skb before messing gso_size/gso_segs */
        WARN_ON_ONCE(skb_cloned(skb));
 
-       if (skb->len <= mss_now || !sk_can_gso(sk) ||
-           skb->ip_summed == CHECKSUM_NONE) {
+       if (skb->len <= mss_now || skb->ip_summed == CHECKSUM_NONE) {
                /* Avoid the costly divide in the normal
                 * non-TSO case.
                 */
---------------------------------------------------------------------

When I revert it, even kernel v3.18-rc6 starts working.
But I doubt this is the root problem, may be just hiding another issue.

--- Sample output of socket_client using vanilla v3.12 kernel ---
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1370
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1370
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1370
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1370
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1338
[1417258063 SEND result: 4096, strerror: Success]
tcp max seg: res: 0, max_seg: 1338
*STUCK*
--------------------------------------------------------

The "machine" is running on KVM and using "virtio_net" as NIC driver.
I've played with the ethtool offload settings:

*** eth1 defaults ***
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off

*** eth1 working (no stalls) using vanilla kernel ***
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: off  <-- the magic switch
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off

When I turn "tx-checksumming" back on, it fails again.
Though that is probably also just a side effect.

I can provide tcpdumps if needed but they are no real help
since you can just see the kernel stops sending TCP packets.
(and the outgoing TCP packets are encrypted in ESP packets)


Any vague idea what might be the root cause?

I also tried reverting commit 4d53eff48b5f03ce67f4f301d6acca1d2145cb7a
("xfrm: Don't queue retransmitted packets if the original is still on the host")
but that didn't change the situation. In fact it wasn't even triggered.

Please CC: comments. Thanks.

Best regards,
Thomas

[-- Attachment #2: socket_client.c --]
[-- Type: text/x-csrc, Size: 1327 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/un.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>

/*
    Remote server: socat - tcp4-listen:667
*/

int main()
{
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    bzero(&servaddr,sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr=inet_addr("192.168.12.254");
    servaddr.sin_port=htons(667);

    int result = connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
    if(result != 0)
    {
        perror("failed to connect");
        exit(1);
    }

    char sendbuf[4096];
    memset(sendbuf, 0, sizeof(sendbuf));
    strcpy(sendbuf, "NOOP\n");

    int max_seg = 0, max_seg_len = sizeof(max_seg), get_res = 0;

    for (int i = 0; i < 10; ++i)
    {
        errno = 0;
        int send_res = send(sockfd, sendbuf, sizeof(sendbuf), 0);
        printf("[%d SEND result: %d, strerror: %s]\n", time(NULL), send_res, strerror(errno));

        get_res = getsockopt(sockfd, SOL_TCP, TCP_MAXSEG, &max_seg, &max_seg_len);
        printf("tcp max seg: res: %d, max_seg: %d\n", get_res, max_seg);
    }

    printf("All sent.\n");

    close(sockfd);
    exit(0);
}

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-01-19 22:40 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-29 11:44 [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+ Thomas Jarosch
2014-12-01 10:25 ` Herbert Xu
2014-12-01 11:20   ` Thomas Jarosch
2014-12-31 13:39   ` tcp: Do not apply TSO segment limit to non-TSO packets Herbert Xu
2014-12-31 13:42     ` Herbert Xu
2015-01-02 18:24       ` Eric Dumazet
2015-01-02 20:36         ` David Miller
2015-01-02 22:01           ` Herbert Xu
2015-01-02 22:06             ` David Miller
2015-01-02 22:09               ` Herbert Xu
2015-01-02 21:13     ` David Miller
2015-01-16 10:45     ` Thomas Jarosch
2015-01-16 10:50       ` Herbert Xu
2015-01-16 11:03         ` Thomas Jarosch
2015-01-19 13:39     ` Thomas Jarosch
     [not found]       ` <CANn89i+U-PFbuUrp08s3Ec8BmjPFq1zj8Aj2=vPVO4-iiLkTuw@mail.gmail.com>
2015-01-19 22:36         ` Herbert Xu
2015-01-19 22:38           ` Eric Dumazet
2015-01-19 22:40             ` Herbert Xu
2014-12-01 13:17 ` [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+ Wolfgang Walter
2014-12-01 16:41   ` Wolfgang Walter
2014-12-05 12:09     ` Wolfgang Walter
2014-12-05 13:26       ` Eric Dumazet
2014-12-08 22:20         ` Wolfgang Walter
2014-12-09  8:54           ` Thomas Jarosch
2014-12-09 14:26             ` [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3 Eric Dumazet
2014-12-09 14:49               ` Thomas Jarosch
2014-12-09 20:36               ` Wolfgang Walter
2014-12-09 21:40                 ` Eric Dumazet
2014-12-10 18:34                   ` Wolfgang Walter
2014-12-10 19:10                     ` Eric Dumazet
2014-12-11  0:36                       ` Wolfgang Walter
2014-12-12 16:58                       ` Wolfgang Walter
2014-12-12 17:27                         ` Eric Dumazet
2014-12-12 20:31                           ` Wolfgang Walter
2014-12-12 21:30                             ` Thomas Jarosch
2014-12-12 22:31                               ` Eric Dumazet
2014-12-12 23:47                                 ` Wolfgang Walter
2014-12-13  0:15                                   ` Eric Dumazet
2014-12-13  0:43                                     ` Wolfgang Walter
2014-12-15 18:04                                     ` Wolfgang Walter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).