From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, "David S. Miller" <davem@davemloft.net>
Subject: [patch 30/62] tcp: Clear probes_out more aggressively in tcp_ack().
Date: Wed, 30 Jul 2008 16:58:55 -0700 [thread overview]
Message-ID: <20080730235855.GD12896@suse.de> (raw)
In-Reply-To: <20080730234915.GA12426@suse.de>
[-- Attachment #1: tcp-clear-probes_out-more-aggressively-in-tcp_ack.patch --]
[-- Type: text/plain, Size: 11329 bytes --]
2.6.26 -stable review patch. If anyone has any objections, please let
us know.
------------------
From: David S. Miller <davem@davemloft.net>
[ Upstream commit 4b53fb67e385b856a991d402096379dab462170a ]
This is based upon an excellent bug report from Eric Dumazet.
tcp_ack() should clear ->icsk_probes_out even if there are packets
outstanding. Otherwise if we get a sequence of ACKs while we do have
packets outstanding over and over again, we'll never clear the
probes_out value and eventually think the connection is too sick and
we'll reset it.
This appears to be some "optimization" added to tcp_ack() in the 2.4.x
timeframe. In 2.2.x, probes_out is pretty much always cleared by
tcp_ack().
Here is Eric's original report:
----------------------------------------
Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost.
In order to reproduce the problem, please try the following program on linux-2.6.25.*
Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000
iptables -N SLOWLO
iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT
iptables -A SLOWLO -j DROP
iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO
Then run the attached program and see the output :
# ./loop
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15)
write(): Connection timed out
wrote 890 bytes but was interrupted after 9 seconds
ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455
Exiting read() because no data available (4000 ms timeout).
read 860 bytes
While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15)
tcpdump :
15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257
15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257
15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257
15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257
15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257
15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257
15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257
15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257
15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257
15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257
15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257
15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257
15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257
15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257
15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257
15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257
15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257
15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257
15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257
15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257
15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257
15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257
15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257
15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257
15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257
15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257
15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257
15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257
15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257
15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257
15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257
15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257
15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257
15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257
15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257
15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0
Source of program :
/*
* small producer/consumer program.
* setup a listener on 127.0.0.1:12000
* Forks a child
* child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms
* Father accepts connection, and read all data
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>
#include <sys/poll.h>
int port = 12000;
char buffer[4096];
int main(int argc, char *argv[])
{
int lfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in socket_address;
time_t t0, t1;
int on = 1, sfd, res;
unsigned long total = 0;
socklen_t alen = sizeof(socket_address);
pid_t pid;
time(&t0);
socket_address.sin_family = AF_INET;
socket_address.sin_port = htons(port);
socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
if (lfd == -1) {
perror("socket()");
return 1;
}
setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int));
if (bind(lfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
perror("bind");
close(lfd);
return 1;
}
if (listen(lfd, 1) == -1) {
perror("listen()");
close(lfd);
return 1;
}
pid = fork();
if (pid == 0) {
int i, cfd = socket(AF_INET, SOCK_STREAM, 0);
close(lfd);
if (connect(cfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
perror("connect()");
return 1;
}
for (i = 0 ; ;) {
res = write(cfd, "blablabla\n", 10);
if (res > 0) total += res;
else if (res == -1) {
perror("write()");
break;
} else break;
usleep(100000);
if (++i == 10) {
system("ss -on dst 127.0.0.1:12000");
i = 0;
}
}
time(&t1);
fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0));
system("ss -on | grep 127.0.0.1:12000");
close(cfd);
return 0;
}
sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen);
if (sfd == -1) {
perror("accept");
return 1;
}
close(lfd);
while (1) {
struct pollfd pfd[1];
pfd[0].fd = sfd;
pfd[0].events = POLLIN;
if (poll(pfd, 1, 4000) == 0) {
fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n");
break;
}
res = read(sfd, buffer, sizeof(buffer));
if (res > 0) total += res;
else if (res == 0) break;
else perror("read()");
}
fprintf(stderr, "read %lu bytes\n", total);
close(sfd);
return 0;
}
----------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
net/ipv4/tcp_input.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3273,6 +3273,7 @@ static int tcp_ack(struct sock *sk, stru
* log. Something worked...
*/
sk->sk_err_soft = 0;
+ icsk->icsk_probes_out = 0;
tp->rcv_tstamp = tcp_time_stamp;
prior_packets = tp->packets_out;
if (!prior_packets)
@@ -3305,8 +3306,6 @@ static int tcp_ack(struct sock *sk, stru
return 1;
no_queue:
- icsk->icsk_probes_out = 0;
-
/* If this ack opens up a zero window, clear backoff. It was
* being used to time the probes, and is probably far higher than
* it needs to be for normal retransmission.
--
next prev parent reply other threads:[~2008-07-31 0:16 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20080730233050.332789722@mini.kroah.org>
2008-07-30 23:49 ` [patch 00/62] 2.6.26-stable review Greg KH
2008-07-30 23:57 ` [patch 01/62] pxamci: trivial fix of DMA alignment register bit clearing Greg KH
2008-07-30 23:57 ` [patch 02/62] udplite: Protection against coverage value wrap-around Greg KH
2008-07-30 23:57 ` [patch 03/62] ipv6: use timer pending Greg KH
2008-07-30 23:57 ` [patch 04/62] ipv6: __KERNEL__ ifdef struct ipv6_devconf Greg KH
2008-07-30 23:57 ` [patch 05/62] hdlcdrv: Fix CRC calculation Greg KH
2008-07-30 23:57 ` [patch 06/62] quota: fix possible infinite loop in quota code Greg KH
2008-07-30 23:58 ` [patch 07/62] isofs: fix minor filesystem corruption Greg KH
2008-07-30 23:58 ` [patch 08/62] KVM: VMX: Fix a wrong usage of vmcs_config Greg KH
2008-07-30 23:58 ` [patch 09/62] KVM: SVM: fix suspend/resume support Greg KH
2008-07-30 23:58 ` [patch 10/62] KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be held Greg KH
2008-07-30 23:58 ` [patch 11/62] KVM: VMX: Add ept_sync_context in flush_tlb Greg KH
2008-07-30 23:58 ` [patch 12/62] KVM: x86 emulator: Fix HLT instruction Greg KH
2008-07-30 23:58 ` [patch 13/62] KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction Greg KH
2008-07-30 23:58 ` [patch 14/62] KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hosts Greg KH
2008-07-30 23:58 ` [patch 15/62] Patch Upstream: x86 ptrace: fix PTRACE_GETFPXREGS error Greg KH
2008-07-30 23:58 ` [patch 16/62] rcu: fix rcu_try_flip_waitack_needed() to prevent grace-period stall Greg KH
2008-07-30 23:58 ` [patch 17/62] Fix typos from signal_32/64.h merge Greg KH
2008-07-30 23:58 ` [patch 18/62] x86 reboot quirks: add Dell Precision WorkStation T5400 Greg KH
2008-07-30 23:58 ` [patch 19/62] USB: fix usb serial pm counter decrement for disconnected interfaces Greg KH
2008-07-30 23:58 ` [patch 20/62] x86, suspend, acpi: enter Big Real Mode Greg KH
2008-08-05 12:15 ` Pavel Machek
2008-07-30 23:58 ` [patch 21/62] markers: fix duplicate modpost entry Greg KH
2008-07-30 23:58 ` [patch 22/62] Fix build on COMPAT platforms when CONFIG_EPOLL is disabled Greg KH
2008-07-30 23:58 ` [patch 24/62] cpusets: fix wrong domain attr updates Greg KH
2008-07-30 23:58 ` [patch 25/62] x86: fix crash due to missing debugctlmsr on AMD K6-3 Greg KH
2008-07-30 23:58 ` [patch 23/62] proc: fix /proc/*/pagemap Greg KH
2008-07-30 23:58 ` [patch 26/62] ide-cd: fix oops when using growisofs Greg KH
2008-07-30 23:58 ` [patch 27/62] rtc-at91rm9200: avoid spurious irqs Greg KH
2008-07-30 23:58 ` [patch 28/62] vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section Greg KH
2008-07-30 23:58 ` [patch 29/62] ARM: fix fls() for 64-bit arguments Greg KH
2008-07-30 23:58 ` Greg KH [this message]
2008-07-30 23:58 ` [patch 31/62] sparc64: Fix lockdep issues in LDC protocol layer Greg KH
2008-07-30 23:59 ` [patch 32/62] sparc64: Fix cpufreq notifier registry Greg KH
2008-07-30 23:59 ` [patch 33/62] sparc64: Do not define BIO_VMERGE_BOUNDARY Greg KH
2008-07-30 23:59 ` [patch 34/62] iop-adma: fix platform driver hotplug/coldplug Greg KH
2008-07-30 23:59 ` [patch 35/62] myri10ge: do not forget to setup the single slice pointers Greg KH
2008-07-30 23:59 ` [patch 36/62] myri10ge: do not use mgp->max_intr_slots before loading the firmware Greg KH
2008-07-30 23:59 ` [patch 37/62] ALSA: trident - pause s/pdif output Greg KH
2008-07-30 23:59 ` [patch 38/62] V4L: cx18: Upgrade to newer firmware & update documentation Greg KH
2008-07-30 23:59 ` [patch 39/62] DVB: dib0700: add support for Hauppauge Nova-TD Stick 52009 Greg KH
2008-07-30 23:59 ` [patch 40/62] V4L: uvcvideo: Fix a buffer overflow in format descriptor parsing Greg KH
2008-07-30 23:59 ` [patch 41/62] V4L: uvcvideo: Use GFP_NOIO when allocating memory during resume Greg KH
2008-07-30 23:59 ` [patch 42/62] V4L: uvcvideo: Dont free URB buffers on suspend Greg KH
2008-07-30 23:59 ` [patch 43/62] V4L: uvcvideo: Make input device support optional Greg KH
2008-07-30 23:59 ` [patch 44/62] V4L: uvcvideo: Add support for Medion Akoya Mini E1210 integrated webcam Greg KH
2008-07-30 23:59 ` [patch 45/62] V4L: saa7134: Copy tuner data earlier to avoid overwriting manual tuner type Greg KH
2008-07-30 23:59 ` [patch 46/62] V4L: cx23885: Bugfix for concurrent use of /dev/video0 and /dev/video1 Greg KH
2008-07-30 23:59 ` [patch 47/62] DVB: cx23885: Ensure PAD_CTRL is always reset to a sensible default Greg KH
2008-07-30 23:59 ` [patch 48/62] DVB: cx23885: DVB Transport cards using DVB port VIDB/TS1 did not stream Greg KH
2008-07-30 23:59 ` [patch 49/62] DVB: cx23885: Reallocated the sram to avoid concurrent VIDB/C issues Greg KH
2008-07-30 23:59 ` [patch 50/62] DVB: cx23885: SRAM changes for the 885 and 887 silicon parts Greg KH
2008-07-30 23:59 ` [patch 51/62] x86: fix kernel_physical_mapping_init() for large x86 systems Greg KH
2008-07-30 23:59 ` [patch 52/62] eCryptfs: use page_alloc not kmalloc to get a page of memory Greg KH
2008-07-30 23:59 ` [patch 53/62] UML - Fix boot crash Greg KH
2008-07-30 23:59 ` [patch 54/62] ixgbe: remove device ID for unsupported device Greg KH
2008-07-30 23:59 ` [patch 55/62] mpc52xx_psc_spi: fix block transfer Greg KH
2008-07-30 23:59 ` [patch 56/62] tmpfs: fix kernel BUG in shmem_delete_inode Greg KH
2008-07-30 23:59 ` [patch 57/62] markers: fix markers read barrier for multiple probes Greg KH
2008-07-31 0:00 ` [patch 58/62] VFS: increase pseudo-filesystem block size to PAGE_SIZE Greg KH
2008-07-31 0:00 ` [patch 59/62] cpufreq acpi: only call _PPC after cpufreq ACPI init funcs got called already Greg KH
2008-07-31 0:00 ` [patch 60/62] b43legacy: Release mutex in error handling code Greg KH
2008-07-31 0:00 ` [patch 61/62] ath5k: dont enable MSI, we cannot handle it yet Greg KH
2008-07-31 0:00 ` [patch 62/62] Fix off-by-one error in iov_iter_advance() Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080730235855.GD12896@suse.de \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=davej@redhat.com \
--cc=davem@davemloft.net \
--cc=eteo@redhat.com \
--cc=jake@lwn.net \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox