netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael T Kerrisk" <mtk-lists@gmx.net>
To: netdev@oss.sgi.com
Cc: michael.kerrisk@gmx.net
Subject: TCP_CORK 200ms maximum cork time -- expected behaviour?
Date: Fri, 20 Aug 2004 16:00:33 +0200 (MEST)	[thread overview]
Message-ID: <18686.1093010433@www70.gmx.net> (raw)

Gidday,

I tried posting this several weeks back, but got no response.  
I'll try again, this time with programs (see below) that 
demonstrate (also see below) what I’m seeing.

The TCP_CORK socket option allows us to perform multiple 
write()s (or send()s or sendfile()s) while delaying the 
transmission of an outgoing TCP segment until the option is 
disabled (or a segment MSS is filled or the socket is closed).

All is fine and good, but there's one point I'm puzzled 
about: even when TCP_CORK is set, buffered data will still be 
transmitted after a 200 millisecond delay (the delay counts 
from the time that the first corked byte was written), 
**even if TCP_CORK is still set**.  So, I'm wondering:

1. Is this intended behaviour, or simply an 
   outgrowth of the combined implementations of 
   TCP_CORK and TCP_NAGLE_OFF?

2. If it's intended behaviour, what is the 
   rationale for the ceiling time on corking?

I first observed this behaviour quite some time back, but 
I've verified that it is still current (2.4.26 and 2.6.8.1 
kernels).  (In passing: of course, similar behaviour occurs 
with MSG_MORE on TCP sockets.)

Here's what I see using my two test programs:

tcp_cork_receive port
    This binds to a port, accepts a connection
    and then reads blocks displaying them along
    with the time that the read() completed.

tcp_cork_send [options] server port num-writes buf-size
    Connect to server/port, perform specified number
    ('num-writes') of writes, each containing 'buf-size'
    bytes.  By default, this program enables TCP_CORK
    on the socket.

    Various options are provided, but the only one 
    needed for the test is '-d usecs' which specifies
    a number of microseconds to usleep() between writes.
    
In the following (run on 2.6.8.1), tcp_cork_send is used to 
write 100 bytes, one at a time, with a 10 millisecond delay 
between writes:

$ ./tcp_cork_receive 9999 &
[1] 8868
$ ./tcp_cork_send -d 10000 localhost 9999 100 1
[PID 8868] 1093009988.950: Receiver accepted connecton
[PID 8869] 1093009988.951: Enabled TCP_CORK
[PID 8869] 1093009988.951: TCP_CORK=1
[PID 8868] 1093009989.152: [received 17 bytes]
[PID 8868] 1093009989.359: [received 17 bytes]
[PID 8868] 1093009989.563: [received 17 bytes]
[PID 8868] 1093009989.767: [received 17 bytes]
[PID 8868] 1093009989.971: [received 17 bytes]
[PID 8869] 1093009990.154: Completed writes
[PID 8868] 1093009990.155: [received 15 bytes]
[1]+  Done                    ./tcp_cork_receive 9999


The "received" messages appear every 200 milliseconds, even
though the sender did not disable TCP_CORK.  Based on what I’d
read/heard about TCP_CORK, I would have expected to see only 
one "received" after the sender had closed the socket.  But, 
instead, there is clearly a 200 millisecond ceiling on corkage.

Cheers,

Michael




/* tcp_cork_receive.c */

#include <sys/types.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define errMsg(msg)     { perror(msg); }

#define errExit(msg)    { perror(msg); exit(EXIT_FAILURE); }

#define usageErr(msg, progName) \
                        { fprintf(stderr, "Usage: "); \
                          fprintf(stderr, msg, progName); \
                          exit(EXIT_FAILURE); }
static void
traceInfo(void)
{
    struct timeval tv;

    if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
    printf("[PID %ld] %8.3f: ", (long) getpid(),
            tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */

int
main(int argc, char *argv[])
{
    int lfd, sfd;
    ssize_t numRead;
#define BUF_SIZE 100000
    char buf[BUF_SIZE];
    int optval;
    struct sockaddr_in svaddr;

    if (argc != 2  || strcmp(argv[1], "--help") == 0) {
        fprintf(stderr, "%s port\n", argv[0]);
        exit(EXIT_FAILURE);
    } 

    lfd = socket(AF_INET, SOCK_STREAM, 0);
    if (lfd == -1) errExit("socket");

    memset(&svaddr, 0, sizeof(struct sockaddr_in));
    svaddr.sin_family = AF_INET;
    svaddr.sin_port = htons(atoi(argv[1]));
    svaddr.sin_addr.s_addr = htonl(INADDR_ANY);

    optval = 1;
    if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                sizeof(optval)) == -1) errExit("setsockopt");

    if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in))
            == -1) errExit("bind");

    if (listen(lfd, 5) == -1) errExit("listen");

    sfd = accept(lfd, NULL, NULL);
    if (sfd == -1) errExit("accept");

    traceInfo();
    printf("Receiver accepted connecton\n");

    for (;;) {
        numRead = read(sfd, buf, BUF_SIZE);
        if (numRead == -1) errExit("read");
        if (numRead == 0)
            break;
        traceInfo();
        /* printf("[received %d bytes] %.*s\n", numRead,
           (int) numRead, buf); */
        printf("[received %d bytes]\n", numRead);
    } 

    close(lfd);
    close(sfd);
    exit(EXIT_SUCCESS);
} /* main */




/* tcp_cork_send.c */

#define _XOPEN_SOURCE 500    
#include <sys/types.h>
#include <unistd.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include "inet_sockets.h"

typedef enum { FALSE, TRUE } Boolean;

#define errMsg(msg)     { perror(msg); }

#define errExit(msg)    { perror(msg); exit(EXIT_FAILURE); }

#define fatalErr(msg)   { fprintf(stderr, "%s\n", msg); \
                          exit(EXIT_FAILURE); }

static void
traceInfo(void)
{
    struct timeval tv;

    if (gettimeofday(&tv, NULL) == -1) errExit("gettimeofday");
    printf("[PID %ld] %8.3f: ", (long) getpid(),
            tv.tv_sec + tv.tv_usec / 1000000.0);
} /* traceInfo */

static void
usageError(char *progName, char *msg)
{
    if (msg != NULL)
        fprintf(stderr, "%s\n", msg);

    fprintf(stderr,
        "%s [options] server port num-write buf-size\n"
        "\tnum-write        Number of writes\n"
        "\tbuf-size         Size of buffer for each write\n"
        "\tOptions are:\n"
        "\t\t-d usecs  Microsecs delay between each write\n"
        "\t\t-n        Don't enable TCP_CORK before writes\n"
        "\t\t-s nsecs  Sleep for 'nsecs' seconds before closing socket\n"
        "\t\t-u        Disable TCP_CORK immediately after sending\n"
        "\t\t-v        Verbose reporting of (delayed) writes\n"
        ,
        progName);
    exit(EXIT_FAILURE);
} /* usageError */

int
main(int argc, char *argv[])
{
    int numWrites, j, sfd;
    useconds_t delayUsecs;
    size_t bufSize;
    char *buf;
    int optval;
    int opt;
    int finalSleepSecs;
    Boolean nocork, uncork, verbose;
    socklen_t optlen;
    struct sockaddr_in svaddr;
    struct hostent *h;
    struct in_addr **addrpp;

    nocork = FALSE;
    uncork = FALSE;
    verbose = FALSE;
    finalSleepSecs = 0;
    delayUsecs = 0;

    while ((opt = getopt(argc, argv, "d:us:vn")) != -1) {
        switch (opt) {
        case 'u':
            uncork = TRUE;
            break;

        case 's':
            finalSleepSecs = atoi(optarg);
            break;

        case 'n':
            nocork = TRUE;
            break;

        case 'v':
            verbose = TRUE;
            break;

        case 'd':
            delayUsecs = atoi(optarg);
            break;

        default:
            usageError(argv[0], "Bad option");
        } /* switch */
    } /* while */

    if (nocork && uncork)
        fatalErr("Can't specify both -n and -u options");

    if (argc != optind + 4 || strcmp(argv[optind], "--help") == 0)
        usageError(argv[0], NULL);

    numWrites = atoi(argv[optind + 2]);
    bufSize = atoi(argv[optind + 3]);

    buf = malloc(bufSize);
    for (j = 0; j < bufSize; j++)
        buf[j] = 'a' + j % 26;

    sfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sfd == -1) errExit("socket");

    memset(&svaddr, 0, sizeof(struct sockaddr_in));
    svaddr.sin_family = AF_INET;
    svaddr.sin_port = htons(atoi(argv[optind + 1]));

    h = gethostbyname(argv[optind]);
    if (h == NULL)
        fatalErr("host lookup failed (gethostbyname())");
    addrpp = (struct in_addr **) h->h_addr_list;
    svaddr.sin_addr.s_addr = (*addrpp)->s_addr;

    if (connect(sfd, (struct sockaddr *) &svaddr,
            sizeof(struct sockaddr_in)) == -1) errExit("connect");

    if (!nocork) {
        optval = 1;
        if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
                sizeof(optval)) == -1) errExit("setsockopt");

        traceInfo();
        printf("Enabled TCP_CORK\n");
    } 

    optlen = sizeof(optval);
    if (getsockopt(sfd, IPPROTO_TCP,  TCP_CORK, &optval, &optlen) == -1)
        errExit("getsockopt");
    traceInfo();
    printf("TCP_CORK=%d\n", optval);

    for (j = 0; j < numWrites; j++) {
         if (write(sfd, buf, bufSize) != bufSize)
             errExit("write");

         if (delayUsecs > 0) {
             if (verbose) {
                 traceInfo();
                 printf("sleep %d\n", j);
             } 
             usleep(delayUsecs);
         } 
    } 

    traceInfo();
    printf("Completed writes\n");

    if (uncork) {
        traceInfo();
        printf("Disabling TCP_CORK\n");
        optval = 0;
        if (setsockopt(sfd, IPPROTO_TCP, TCP_CORK, &optval,
                sizeof(optval)) == -1) errExit("setsockopt");
    } 

    if (finalSleepSecs > 0) {
        traceInfo();
        printf("Sleeping\n");
        sleep(finalSleepSecs);
    } 

    close(sfd);
    exit(EXIT_SUCCESS);
} /* main */

-- 
Michael Kerrisk
mtk-lists@gmx.net

Supergünstige DSL-Tarife + WLAN-Router für 0,- EUR*
Jetzt zu GMX wechseln und sparen http://www.gmx.net/de/go/dsl

             reply	other threads:[~2004-08-20 14:00 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-20 14:00 Michael T Kerrisk [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-07-01 12:47 TCP_CORK 200ms maximum cork time -- expected behaviour? Michael Kerrisk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18686.1093010433@www70.gmx.net \
    --to=mtk-lists@gmx.net \
    --cc=michael.kerrisk@gmx.net \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).