From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominique Martinet Subject: tcp hang when socket fills up ? Date: Fri, 6 Apr 2018 11:07:20 +0200 Message-ID: <20180406090720.GA31845@nautica> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit To: netdev@vger.kernel.org Return-path: Received: from nautica.notk.org ([91.121.71.147]:35534 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751367AbeDFJQS (ORCPT ); Fri, 6 Apr 2018 05:16:18 -0400 Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-ID: (current kernel: vanilla 4.14.29) I've been running into troubles recently where a sockets "fills up" (as in, select() will no longer return it in its outfd / attempting to write to it after setting it to NONBLOCK will return -EWOULDBLOCK) and it doesn't seem to ever "unfill" until the tcp connexion timeout. The previous time I pushed it down to the application for not trying to read the socket either as I assume the buffers could be shared and just waiting won't take data out, but this time I'm a bit more skeptical as socat waits for the fd in both read and write... Let me take a minute to describe my setup, I don't think that how the socket was created matters but it might be interesting: - I have two computers behind NATs, no port forwarding on either side - One (call it C for client) runs ssh with a proxycommand ncat/socat to control the source port, e.g. $ ssh -o ProxyCommand="socat stdio tcp::,sourceport=" server - The server runs another socat to connect to that and forwards to ssh locally, e.g. $ socat tcp::,sourceport= tcp:127.0.0.1:22 (yes, both are connect() calls, and that just works™ thanks to initial syn replay and conntrack on routers) When things stall, the first socat is in a select with both fd in reading, so it's waiting data. The second socat has data coming from ssh and is in a select with the client-facing socket in both read and write - that select never returns so the socket is not available for reading or writing and does not free up until the connection eventually times out a few minutes later. At this point, I only see tcp replays in tcpdump/wireshark. I've compared dumps from both sides and there are no lost packets, only reordering - there always is a batch of acks that were sent regularily coming in shortly before the hang. Here's the trace on the server: 16:49:26.735042 IP .13317 > .31872: Flags [.], seq 70476:71850, ack 4190, win 307, options [nop,nop,TS val 1313937641 ecr 1617129473], length 1374 16:49:26.735046 IP .13317 > .31872: Flags [.], seq 71850:73224, ack 4190, win 307, options [nop,nop,TS val 1313937641 ecr 1617129473], length 1374 16:49:26.735334 IP .31872 > .13317: Flags [.], ack 41622, win 918, options [nop,nop,TS val 1617129478 ecr 1313937609], length 0 16:49:26.736005 IP .31872 > .13317: Flags [.], ack 42996, win 940, options [nop,nop,TS val 1617129478 ecr 1313937609], length 0 16:49:26.736402 IP .13317 > .31872: Flags [.], seq 73224:74598, ack 4190, win 307, options [nop,nop,TS val 1313937643 ecr 1617129473], length 1374 16:49:26.736408 IP .13317 > .31872: Flags [.], seq 74598:75972, ack 4190, win 307, options [nop,nop,TS val 1313937643 ecr 1617129473], length 1374 16:49:26.738561 IP .31872 > .13317: Flags [.], ack 44370, win 963, options [nop,nop,TS val 1617129482 ecr 1313937616], length 0 16:49:26.739539 IP .31872 > .13317: Flags [.], ack 45744, win 986, options [nop,nop,TS val 1617129482 ecr 1313937616], length 0 16:49:26.739882 IP .31872 > .13317: Flags [.], ack 47118, win 1008, options [nop,nop,TS val 1617129484 ecr 1313937617], length 0 16:49:26.740255 IP .31872 > .13317: Flags [.], ack 48492, win 1031, options [nop,nop,TS val 1617129484 ecr 1313937617], length 0 16:49:26.746756 IP .31872 > .13317: Flags [.], ack 49866, win 1053, options [nop,nop,TS val 1617129493 ecr 1313937627], length 0 16:49:26.747923 IP .31872 > .13317: Flags [.], ack 51240, win 1076, options [nop,nop,TS val 1617129494 ecr 1313937627], length 0 16:49:26.749083 IP .31872 > .13317: Flags [.], ack 52614, win 1099, options [nop,nop,TS val 1617129495 ecr 1313937629], length 0 16:49:26.750171 IP .31872 > .13317: Flags [.], ack 53988, win 1121, options [nop,nop,TS val 1617129496 ecr 1313937629], length 0 16:49:26.750808 IP .31872 > .13317: Flags [.], ack 55362, win 1144, options [nop,nop,TS val 1617129497 ecr 1313937629], length 0 16:49:26.754648 IP .31872 > .13317: Flags [.], ack 56736, win 1167, options [nop,nop,TS val 1617129500 ecr 1313937629], length 0 16:49:26.755985 IP .31872 > .13317: Flags [.], ack 58110, win 1189, options [nop,nop,TS val 1617129501 ecr 1313937630], length 0 16:49:26.758513 IP .31872 > .13317: Flags [.], ack 59484, win 1212, options [nop,nop,TS val 1617129502 ecr 1313937630], length 0 16:49:26.759096 IP .31872 > .13317: Flags [.], ack 60858, win 1234, options [nop,nop,TS val 1617129503 ecr 1313937635], length 0 16:49:26.759421 IP .31872 > .13317: Flags [.], ack 62232, win 1257, options [nop,nop,TS val 1617129503 ecr 1313937635], length 0 16:49:26.759755 IP .31872 > .13317: Flags [.], ack 63606, win 1280, options [nop,nop,TS val 1617129504 ecr 1313937636], length 0 16:49:26.760653 IP .31872 > .13317: Flags [.], ack 64980, win 1302, options [nop,nop,TS val 1617129505 ecr 1313937636], length 0 16:49:26.761453 IP .31872 > .13317: Flags [.], ack 66354, win 1325, options [nop,nop,TS val 1617129506 ecr 1313937638], length 0 16:49:26.762199 IP .31872 > .13317: Flags [.], ack 67728, win 1348, options [nop,nop,TS val 1617129507 ecr 1313937638], length 0 16:49:26.763547 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 67728, win 1348, options [nop,nop,TS val 1617129507 ecr 1313937638], length 36 16:49:26.763553 IP .31872 > .13317: Flags [.], ack 70476, win 1393, options [nop,nop,TS val 1617129508 ecr 1313937639], length 0 16:49:26.764298 IP .31872 > .13317: Flags [.], ack 73224, win 1438, options [nop,nop,TS val 1617129509 ecr 1313937641], length 0 16:49:26.764676 IP .31872 > .13317: Flags [.], ack 75972, win 1444, options [nop,nop,TS val 1617129510 ecr 1313937643], length 0 16:49:26.807754 IP .13317 > .31872: Flags [.], seq 75972:77346, ack 4190, win 307, options [nop,nop,TS val 1313937714 ecr 1617129473], length 1374 16:49:26.876467 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617129620 ecr 1313937714], length 0 16:49:27.048760 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313937955 ecr 1617129473], length 1374 16:49:27.051791 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617129762 ecr 1313937714], length 36 16:49:27.076444 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617129822 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:27.371182 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617130018 ecr 1313937714], length 36 16:49:27.519862 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313938426 ecr 1617129473], length 1374 16:49:27.547662 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617130293 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:27.883372 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617130530 ecr 1313937714], length 36 16:49:28.511861 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313939418 ecr 1617129473], length 1374 16:49:28.538891 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617131285 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:28.907197 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617131554 ecr 1313937714], length 36 16:49:30.431864 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313941338 ecr 1617129473], length 1374 16:49:30.459127 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617133204 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:30.955388 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617133602 ecr 1313937714], length 36 16:49:34.207879 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313945114 ecr 1617129473], length 1374 16:49:34.235726 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617136981 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:35.256285 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617137954 ecr 1313937714], length 36 16:49:42.143864 IP .13317 > .31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313953050 ecr 1617129473], length 1374 16:49:42.171531 IP .31872 > .13317: Flags [.], ack 77346, win 1444, options [nop,nop,TS val 1617144917 ecr 1313937714,nop,nop,sack 1 {32004:33378}], length 0 16:49:43.448262 IP .31872 > .13317: Flags [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617146146 ecr 1313937714], length 36 (I still have the pcap files if someone wants to see them, but I'd rather not give my work IP publicly so will send it privately on request) At this point, only the same 3 packets keep being replayed over and over... According to 'ss' the Send-Q isn't empty on either side, the client has some ~1k to send but the server has much more (87k) After increasing the window size through net.*wmem sysctl it got stuck with over 1MB in Send-Q, which makes sense because the socket is full... So, what I don't get is, why are these acks continuously replayed? Given the timing it looks like the server doesn't take the client acks into account, despite having received that precise 33378 ack earlier and I believe it should accept a higher ack value anyway ? The ultimate question being, how can I go about debugging that? I'm working on getting perf probe/crash to work on the server so I can look at the kernel side now, I can reproduce this semi-easily so I'm sure I'll get down to it eventually, but if anyone has an idea I'm all ears. Thanks! -- Dominique Martinet | Asmadeus