From: "Michael T Kerrisk" <mtk-lists@gmx.net>
To: netdev@oss.sgi.com
Cc: michael.kerrisk@gmx.net
Subject: TCP_CORK 200ms maximum cork time -- expected behaviour?
Date: Fri, 20 Aug 2004 16:00:33 +0200 (MEST) [thread overview]
Message-ID: <18686.1093010433@www70.gmx.net> (raw)
Gidday,
I tried posting this several weeks back, but got no response.
I'll try again, this time with programs (see below) that
demonstrate (also see below) what Im seeing.
The TCP_CORK socket option allows us to perform multiple
write()s (or send()s or sendfile()s) while delaying the
transmission of an outgoing TCP segment until the option is
disabled (or a segment MSS is filled or the socket is closed).
All is fine and good, but there's one point I'm puzzled
about: even when TCP_CORK is set, buffered data will still be
transmitted after a 200 millisecond delay (the delay counts
from the time that the first corked byte was written),
**even if TCP_CORK is still set**. So, I'm wondering:
1. Is this intended behaviour, or simply an
outgrowth of the combined implementations of
TCP_CORK and TCP_NAGLE_OFF?
2. If it's intended behaviour, what is the
rationale for the ceiling time on corking?
I first observed this behaviour quite some time back, but
I've verified that it is still current (2.4.26 and 2.6.8.1
kernels). (In passing: of course, similar behaviour occurs
with MSG_MORE on TCP sockets.)
Here's what I see using my two test programs:
tcp_cork_receive port
This binds to a port, accepts a connection
and then reads blocks displaying them along
with the time that the read() completed.
tcp_cork_send [options] server port num-writes buf-size
Connect to server/port, perform specified number
('num-writes') of writes, each containing 'buf-size'
bytes. By default, this program enables TCP_CORK
on the socket.
Various options are provided, but the only one
needed for the test is '-d usecs' which specifies
a number of microseconds to usleep() between writes.
In the following (run on 2.6.8.1), tcp_cork_send is used to
write 100 bytes, one at a time, with a 10 millisecond delay
between writes:
$ ./tcp_cork_receive 9999 &
[1] 8868
$ ./tcp_cork_send -d 10000 localhost 9999 100 1
[PID 8868] 1093009988.950: Receiver accepted connecton
[PID 8869] 1093009988.951: Enabled TCP_CORK
[PID 8869] 1093009988.951: TCP_CORK=1
[PID 8868] 1093009989.152: [received 17 bytes]
[PID 8868] 1093009989.359: [received 17 bytes]
[PID 8868] 1093009989.563: [received 17 bytes]
[PID 8868] 1093009989.767: [received 17 bytes]
[PID 8868] 1093009989.971: [received 17 bytes]
[PID 8869] 1093009990.154: Completed writes
[PID 8868] 1093009990.155: [received 15 bytes]
[1]+ Done ./tcp_cork_receive 9999
The "received" messages appear every 200 milliseconds, even
though the sender did not disable TCP_CORK. Based on what Id
read/heard about TCP_CORK, I would have expected to see only
one "received" after the sender had closed the socket. But,
instead, there is clearly a 200 millisecond ceiling on corkage.
Cheers,
Michael
/* tcp_cork_receive.c */
#include <sys/types.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#define errMsg(msg) { perror(msg); }
#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
#define usageErr(msg, progName) \
{ fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); }
static void
traceInfo(void)
{
struct timeval tv;
if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
printf("[PID %ld] %8.3f: ", (long) getpid(),
tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */
int
main(int argc, char *argv[])
{
int lfd, sfd;
ssize_t numRead;
#define BUF_SIZE 100000
char buf[BUF_SIZE];
int optval;
struct sockaddr_in svaddr;
if (argc != 2 || strcmp(argv[1], "--help") == 0) {
fprintf(stderr, "%s port\n", argv[0]);
exit(EXIT_FAILURE);
}
lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1) errExit("socket");
memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_port = htons(atoi(argv[1]));
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in))
== -1) errExit("bind");
if (listen(lfd, 5) == -1) errExit("listen");
sfd = accept(lfd, NULL, NULL);
if (sfd == -1) errExit("accept");
traceInfo();
printf("Receiver accepted connecton\n");
for (;;) {
numRead = read(sfd, buf, BUF_SIZE);
if (numRead == -1) errExit("read");
if (numRead == 0)
break;
traceInfo();
/* printf("[received %d bytes] %.*s\n", numRead,
(int) numRead, buf); */
printf("[received %d bytes]\n", numRead);
}
close(lfd);
close(sfd);
exit(EXIT_SUCCESS);
} /* main */
/* tcp_cork_send.c */
#define _XOPEN_SOURCE 500
#include <sys/types.h>
#include <unistd.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include "inet_sockets.h"
typedef enum { FALSE, TRUE } Boolean;
#define errMsg(msg) { perror(msg); }
#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
#define fatalErr(msg) { fprintf(stderr, "%s\n", msg); \
exit(EXIT_FAILURE); }
static void
traceInfo(void)
{
struct timeval tv;
if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
printf("[PID %ld] %8.3f: ", (long) getpid(),
tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */
static void
usageError(char *progName, char *msg)
{
if (msg != NULL)
fprintf(stderr, "%s\n", msg);
fprintf(stderr,
"%s [options] server port num-write buf-size\n"
"\tnum-write Number of writes\n"
"\tbuf-size Size of buffer for each write\n"
"\tOptions are:\n"
"\t\t-d usecs Microsecs delay between each write\n"
"\t\t-n Don't enable TCP_CORK before writes\n"
"\t\t-s nsecs Sleep for 'nsecs' seconds before closing socket\n"
"\t\t-u Disable TCP_CORK immediately after sending\n"
"\t\t-v Verbose reporting of (delayed) writes\n"
,
progName);
exit(EXIT_FAILURE);
} /* usageError */
int
main(int argc, char *argv[])
{
int numWrites, j, sfd;
useconds_t delayUsecs;
size_t bufSize;
char *buf;
int optval;
int opt;
int finalSleepSecs;
Boolean nocork, uncork, verbose;
socklen_t optlen;
struct sockaddr_in svaddr;
struct hostent *h;
struct in_addr **addrpp;
nocork = FALSE;
uncork = FALSE;
verbose = FALSE;
finalSleepSecs = 0;
delayUsecs = 0;
while ((opt = getopt(argc, argv, "d:us:vn")) != -1) {
switch (opt) {
case 'u':
uncork = TRUE;
break;
case 's':
finalSleepSecs = atoi(optarg);
break;
case 'n':
nocork = TRUE;
break;
case 'v':
verbose = TRUE;
break;
case 'd':
delayUsecs = atoi(optarg);
break;
default:
usageError(argv[0], "Bad option");
} /* switch */
} /* while */
if (nocork && uncork)
fatalErr("Can't specify both -n and -u options");
if (argc != optind + 4 || strcmp(argv[optind], "--help") == 0)
usageError(argv[0], NULL);
numWrites = atoi(argv[optind + 2]);
bufSize = atoi(argv[optind + 3]);
buf = malloc(bufSize);
for (j = 0; j < bufSize; j++)
buf[j] = 'a' + j % 26;
sfd = socket(AF_INET, SOCK_STREAM, 0);
if (sfd == -1) errExit("socket");
memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_port = htons(atoi(argv[optind + 1]));
h = gethostbyname(argv[optind]);
if (h == NULL)
fatalErr("host lookup failed (gethostbyname())");
addrpp = (struct in_addr **) h->h_addr_list;
svaddr.sin_addr.s_addr = (*addrpp)->s_addr;
if (connect(sfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1) errExit("connect");
if (!nocork) {
optval = 1;
if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
traceInfo();
printf("Enabled TCP_CORK\n");
}
optlen = sizeof(optval);
if (getsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval, &optlen) == -1)
errExit("getsockopt");
traceInfo();
printf("TCP_CORK=%d\n", optval);
for (j = 0; j < numWrites; j++) {
if (write(sfd, buf, bufSize) != bufSize)
errExit("write");
if (delayUsecs > 0) {
if (verbose) {
traceInfo();
printf("sleep %d\n", j);
}
usleep(delayUsecs);
}
}
traceInfo();
printf("Completed writes\n");
if (uncork) {
traceInfo();
printf("Disabling TCP_CORK\n");
optval = 0;
if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
sizeof(optval)) == -1) errExit("setsockopt");
}
if (finalSleepSecs > 0) {
traceInfo();
printf("Sleeping\n");
sleep(finalSleepSecs);
}
close(sfd);
exit(EXIT_SUCCESS);
} /* main */
--
Michael Kerrisk
mtk-lists@gmx.net
Supergünstige DSL-Tarife + WLAN-Router für 0,- EUR*
Jetzt zu GMX wechseln und sparen http://www.gmx.net/de/go/dsl
next reply other threads:[~2004-08-20 14:00 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-20 14:00 Michael T Kerrisk [this message]
-- strict thread matches above, loose matches on Subject: below --
2004-07-01 12:47 TCP_CORK 200ms maximum cork time -- expected behaviour? Michael Kerrisk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18686.1093010433@www70.gmx.net \
--to=mtk-lists@gmx.net \
--cc=michael.kerrisk@gmx.net \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.