From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Ricardo Leitner Subject: [tcp] Unable to report zero window when flooded with small packets Date: Fri, 14 Jun 2013 10:11:33 -0300 Message-ID: <51BB1685.4070103@redhat.com> Reply-To: mleitner@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jiri Pirko , kaber@trash.net To: netdev@vger.kernel.org Return-path: Received: from mx1.redhat.com ([209.132.183.28]:34288 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752163Ab3FNNLi (ORCPT ); Fri, 14 Jun 2013 09:11:38 -0400 Sender: netdev-owner@vger.kernel.org List-ID: Hi there, First of all, sorry the long email, but this is lengthy and I couldn't narrow it down. My bisect-fu is failing me. We got report saying that after this commit: commit 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723 Author: Patrick McHardy Date: Thu Mar 20 16:11:27 2008 -0700 [TCP]: Fix shrinking windows with window scaling When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. (...) Linux is unable to advertise zero window when using window scale option. I tested it under current net(-next) trees and I can reproduce the issue. Consider the following load type: - A tcp peer sends several tiny packets. - Other peer acts slowly, it won't read its side of this socket for a big while. If the tiny packets sent by client are smaller (payload) than (1 << Window Scale) bytes, server is never able to update available window, as it would be always shrinking the window. As that patch blocks window shrinking with window scaling, then server would never advertise zero window, even when buffer is full. Instead, it will start simply dropping these packets and client will think the server went unreachable, timing out the connection if application doesn't read the socket soon enough. In order to speed up the testing, I'm disabling receive buf moderation by setting SO_RCVBUF to 64k after accept(): so we allow a non-optimal window scale option. Also, when I want to disable window scaling, I just set TCP_WINDOW_CLAMP before listen(). All flow was client->server during the tests. So, for this issue, small packets + Window Scale option: v3.0 stock: doesn't work v3.0 with that commit reverted: works v3.2 with that commit reverted: doesn't work either net-next stock: doesn't work net-next reverted: doesn't work Further testing revealed that v3.3 and newer also have issue when NOT using window scale option. So, for this other issue: v3.2: it's fine. v3.3 with 9f42f126154786e6e76df513004800c8c633f020 reverted: works net-next stock: doesn't work net-next reverted: doesn't work commit 9f42f126154786e6e76df513004800c8c633f020 Author: Ian Campbell Date: Thu Jan 5 07:13:39 2012 +0000 net: pack skb_shared_info more efficiently nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be packed with tx_flags. Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other we can avoid a hole between ip6_frag_id and frag_list on 64 bit systems. with both commits reverted v3.3: when using WS doesn't work; when not using, works fine net-next: doesn't work, either Clearly I'm missing something here, seems there is more than this but I can't track it. Perhaps a corner case with rx buf collapsing? 57 packets pruned from receive queue because of socket buffer overrun 15 packets pruned from receive queue 243 packets collapsed in receive queue due to low socket buffer TCPRcvCoalesce: 6019 I can provide a reproducer and/or captures if it helps. Thanks, Marcelo