* TCP_CORK 200ms maximum cork time -- expected behaviour?
@ 2004-07-01 12:47 Michael Kerrisk
0 siblings, 0 replies; 2+ messages in thread
From: Michael Kerrisk @ 2004-07-01 12:47 UTC (permalink / raw)
To: netdev; +Cc: michael.kerrisk
Gidday,
The TCP_CORK socket option allows us to perform
multiple write()s (or send()s or sendfile()s)
while delaying the transmission of an outgoing
TCP segment until the option is disabled (or a
segment MSS is filled or the socket is closed).
All is fine and good, but there's one point I'm
puzzled about: even when TCP_CORK is set,
buffered data will still be transmitted
after a 200 millisecond delay (the delay
counts from the time that the first corked byte
was written), even if TCP_CORK is still set.
So, I'm wondering:
1. Is this intended behaviour, or simply an
outgrowth of the combined implementations of
TCP_CORK and TCP_NAGLE_OFF?
2. If it's intended behaviour, what is the
rationale for the ceiling time on corking?
Cheers,
Michael
PS I first observed this behaviour quite some time
back, but I've verified that it is still current
(2.4.26 and 2.6.7 kernels). (In passing: of
course, similar behaviour occurs with MSG_MORE on
TCP sockets.)
^ permalink raw reply [flat|nested] 2+ messages in thread
* TCP_CORK 200ms maximum cork time -- expected behaviour?
@ 2004-08-20 14:00 Michael T Kerrisk
0 siblings, 0 replies; 2+ messages in thread
From: Michael T Kerrisk @ 2004-08-20 14:00 UTC (permalink / raw)
To: netdev; +Cc: michael.kerrisk
Gidday,
I tried posting this several weeks back, but got no response.
I'll try again, this time with programs (see below) that
demonstrate (also see below) what Im seeing.
The TCP_CORK socket option allows us to perform multiple
write()s (or send()s or sendfile()s) while delaying the
transmission of an outgoing TCP segment until the option is
disabled (or a segment MSS is filled or the socket is closed).
All is fine and good, but there's one point I'm puzzled
about: even when TCP_CORK is set, buffered data will still be
transmitted after a 200 millisecond delay (the delay counts
from the time that the first corked byte was written),
**even if TCP_CORK is still set**. So, I'm wondering:
1. Is this intended behaviour, or simply an
outgrowth of the combined implementations of
TCP_CORK and TCP_NAGLE_OFF?
2. If it's intended behaviour, what is the
rationale for the ceiling time on corking?
I first observed this behaviour quite some time back, but
I've verified that it is still current (2.4.26 and 2.6.8.1
kernels). (In passing: of course, similar behaviour occurs
with MSG_MORE on TCP sockets.)
Here's what I see using my two test programs:
tcp_cork_receive port
This binds to a port, accepts a connection
and then reads blocks displaying them along
with the time that the read() completed.
tcp_cork_send [options] server port num-writes buf-size
Connect to server/port, perform specified number
('num-writes') of writes, each containing 'buf-size'
bytes. By default, this program enables TCP_CORK
on the socket.
Various options are provided, but the only one
needed for the test is '-d usecs' which specifies
a number of microseconds to usleep() between writes.
In the following (run on 2.6.8.1), tcp_cork_send is used to
write 100 bytes, one at a time, with a 10 millisecond delay
between writes:
$ ./tcp_cork_receive 9999 &
[1] 8868
$ ./tcp_cork_send -d 10000 localhost 9999 100 1
[PID 8868] 1093009988.950: Receiver accepted connecton
[PID 8869] 1093009988.951: Enabled TCP_CORK
[PID 8869] 1093009988.951: TCP_CORK=1
[PID 8868] 1093009989.152: [received 17 bytes]
[PID 8868] 1093009989.359: [received 17 bytes]
[PID 8868] 1093009989.563: [received 17 bytes]
[PID 8868] 1093009989.767: [received 17 bytes]
[PID 8868] 1093009989.971: [received 17 bytes]
[PID 8869] 1093009990.154: Completed writes
[PID 8868] 1093009990.155: [received 15 bytes]
[1]+ Done ./tcp_cork_receive 9999
The "received" messages appear every 200 milliseconds, even
though the sender did not disable TCP_CORK. Based on what Id
read/heard about TCP_CORK, I would have expected to see only
one "received" after the sender had closed the socket. But,
instead, there is clearly a 200 millisecond ceiling on corkage.
Cheers,
Michael
/* tcp_cork_receive.c */
#include <sys/types.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#define errMsg(msg) { perror(msg); }
#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
#define usageErr(msg, progName) \
{ fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); }
static void
traceInfo(void)
{
struct timeval tv;
if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
printf("[PID %ld] %8.3f: ", (long) getpid(),
tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */
int
main(int argc, char *argv[])
{
int lfd, sfd;
ssize_t numRead;
#define BUF_SIZE 100000
char buf[BUF_SIZE];
int optval;
struct sockaddr_in svaddr;
if (argc != 2 || strcmp(argv[1], "--help") == 0) {
fprintf(stderr, "%s port\n", argv[0]);
exit(EXIT_FAILURE);
}
lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1) errExit("socket");
memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_port = htons(atoi(argv[1]));
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in))
== -1) errExit("bind");
if (listen(lfd, 5) == -1) errExit("listen");
sfd = accept(lfd, NULL, NULL);
if (sfd == -1) errExit("accept");
traceInfo();
printf("Receiver accepted connecton\n");
for (;;) {
numRead = read(sfd, buf, BUF_SIZE);
if (numRead == -1) errExit("read");
if (numRead == 0)
break;
traceInfo();
/* printf("[received %d bytes] %.*s\n", numRead,
(int) numRead, buf); */
printf("[received %d bytes]\n", numRead);
}
close(lfd);
close(sfd);
exit(EXIT_SUCCESS);
} /* main */
/* tcp_cork_send.c */
#define _XOPEN_SOURCE 500
#include <sys/types.h>
#include <unistd.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include "inet_sockets.h"
typedef enum { FALSE, TRUE } Boolean;
#define errMsg(msg) { perror(msg); }
#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
#define fatalErr(msg) { fprintf(stderr, "%s\n", msg); \
exit(EXIT_FAILURE); }
static void
traceInfo(void)
{
struct timeval tv;
if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
printf("[PID %ld] %8.3f: ", (long) getpid(),
tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */
static void
usageError(char *progName, char *msg)
{
if (msg != NULL)
fprintf(stderr, "%s\n", msg);
fprintf(stderr,
"%s [options] server port num-write buf-size\n"
"\tnum-write Number of writes\n"
"\tbuf-size Size of buffer for each write\n"
"\tOptions are:\n"
"\t\t-d usecs Microsecs delay between each write\n"
"\t\t-n Don't enable TCP_CORK before writes\n"
"\t\t-s nsecs Sleep for 'nsecs' seconds before closing socket\n"
"\t\t-u Disable TCP_CORK immediately after sending\n"
"\t\t-v Verbose reporting of (delayed) writes\n"
,
progName);
exit(EXIT_FAILURE);
} /* usageError */
int
main(int argc, char *argv[])
{
int numWrites, j, sfd;
useconds_t delayUsecs;
size_t bufSize;
char *buf;
int optval;
int opt;
int finalSleepSecs;
Boolean nocork, uncork, verbose;
socklen_t optlen;
struct sockaddr_in svaddr;
struct hostent *h;
struct in_addr **addrpp;
nocork = FALSE;
uncork = FALSE;
verbose = FALSE;
finalSleepSecs = 0;
delayUsecs = 0;
while ((opt = getopt(argc, argv, "d:us:vn")) != -1) {
switch (opt) {
case 'u':
uncork = TRUE;
break;
case 's':
finalSleepSecs = atoi(optarg);
break;
case 'n':
nocork = TRUE;
break;
case 'v':
verbose = TRUE;
break;
case 'd':
delayUsecs = atoi(optarg);
break;
default:
usageError(argv[0], "Bad option");
} /* switch */
} /* while */
if (nocork && uncork)
fatalErr("Can't specify both -n and -u options");
if (argc != optind + 4 || strcmp(argv[optind], "--help") == 0)
usageError(argv[0], NULL);
numWrites = atoi(argv[optind + 2]);
bufSize = atoi(argv[optind + 3]);
buf = malloc(bufSize);
for (j = 0; j < bufSize; j++)
buf[j] = 'a' + j % 26;
sfd = socket(AF_INET, SOCK_STREAM, 0);
if (sfd == -1) errExit("socket");
memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_port = htons(atoi(argv[optind + 1]));
h = gethostbyname(argv[optind]);
if (h == NULL)
fatalErr("host lookup failed (gethostbyname())");
addrpp = (struct in_addr **) h->h_addr_list;
svaddr.sin_addr.s_addr = (*addrpp)->s_addr;
if (connect(sfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1) errExit("connect");
if (!nocork) {
optval = 1;
if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
traceInfo();
printf("Enabled TCP_CORK\n");
}
optlen = sizeof(optval);
if (getsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval, &optlen) == -1)
errExit("getsockopt");
traceInfo();
printf("TCP_CORK=%d\n", optval);
for (j = 0; j < numWrites; j++) {
if (write(sfd, buf, bufSize) != bufSize)
errExit("write");
if (delayUsecs > 0) {
if (verbose) {
traceInfo();
printf("sleep %d\n", j);
}
usleep(delayUsecs);
}
}
traceInfo();
printf("Completed writes\n");
if (uncork) {
traceInfo();
printf("Disabling TCP_CORK\n");
optval = 0;
if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
}
if (finalSleepSecs > 0) {
traceInfo();
printf("Sleeping\n");
sleep(finalSleepSecs);
}
close(sfd);
exit(EXIT_SUCCESS);
} /* main */
--
Michael Kerrisk
mtk-lists@gmx.net
Supergünstige DSL-Tarife + WLAN-Router für 0,- EUR*
Jetzt zu GMX wechseln und sparen http://www.gmx.net/de/go/dsl
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-08-20 14:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-01 12:47 TCP_CORK 200ms maximum cork time -- expected behaviour? Michael Kerrisk
-- strict thread matches above, loose matches on Subject: below --
2004-08-20 14:00 Michael T Kerrisk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).