From: lm@bitmover.com (Larry McVoy)
To: David Miller <davem@davemloft.net>
Cc: lm@bitmover.com, torvalds@linux-foundation.org,
wscott@bitmover.com, netdev@vger.kernel.org
Subject: Re: tcp bw in 2.6
Date: Mon, 1 Oct 2007 21:23:00 -0700 [thread overview]
Message-ID: <20071002042300.GC5480@bitmover.com> (raw)
In-Reply-To: <20071001.205050.66151815.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]
On Mon, Oct 01, 2007 at 08:50:50PM -0700, David Miller wrote:
> From: lm@bitmover.com (Larry McVoy)
> Date: Mon, 1 Oct 2007 19:20:59 -0700
>
> > A short summary is "can someone please post a test program that sources
> > and sinks data at the wire speed?" because apparently I'm too old and
> > clueless to write such a thing.
>
> You're not showing us your test program so there is no way we
> can help you out.
Attached. Drop it into an lmbench tree and build it.
> My initial inclination, even without that critical information,
> is to ask whether you are setting any socket options in way?
The only one I was playing with was SO_RCVBUF/SO_SNDBUF and I tried
disabling that and I tried playing with the read/write size. Didn't
help.
> In particular, SO_RCVLOWAT can have a large effect here, if you're
> setting it to something, that would explain why dd is doing better. A
> lot of people link to "helper libraries" with interfaces to setup
> sockets with all sorts of socket option settings by default, try not
> using such things if possible.
Agreed. That was my first thought as well, I must have been doing
something that messed up the defaults. But you did get the strace
output, there wasn't anything weird there.
> You also shouldn't dork at all with the receive and send buffer sizes.
> They are adjusted dynamically by the kernel as the window grows. But
> if you set them to specific values, this dynamic logic is turned off.
Yeah, dorking with those is left over from the bad old days of '95
when lmbench was first shipped. But I turned that all off and no
difference.
So feel free to show me where I'm an idiot in the code, but if you
can't, then what would rock would be a little send.c / recv.c that
demonstrated filling the pipe.
--
---
Larry McVoy lm at bitmover.com http://www.bitkeeper.com
[-- Attachment #2: bytes_tcp.c --]
[-- Type: text/x-csrc, Size: 2278 bytes --]
/*
* bytes_tcp.c - simple TCP bandwidth source/sink
*
* server usage: bytes_tcp -s
* client usage: bytes_tcp hostname [msgsize]
*
* Copyright (c) 1994 Larry McVoy.
* Copyright (c) 2002 Carl Staelin. Distributed under the FSF GPL with
* additional restriction that results may published only if
* (1) the benchmark is unmodified, and
* (2) the version in the sccsid below is included in the report.
* Support for this development by Sun Microsystems is gratefully acknowledged.
*/
char *id = "$Id$\n";
#include "bench.h"
#define XFER (1024*1024)
int server_main(int ac, char **av);
int client_main(int ac, char **av);
void source(int data);
void
transfer(int get, int server, char *buf)
{
int c;
while ((get > 0) && (c = read(server, buf, XFER)) > 0) {
get -= c;
}
if (c < 0) {
perror("bytes_tcp: transfer: read failed");
exit(4);
}
}
/* ARGSUSED */
int
client_main(int ac, char **av)
{
int server;
int get = 256 << 20;
char buf[XFER];
char* usage = "usage: %s -remotehost OR %s remotehost [msgsize]\n";
if (ac != 2 && ac != 3) {
(void)fprintf(stderr, usage, av[0], av[0]);
exit(0);
}
if (ac == 3) get = bytes(av[2]);
server = tcp_connect(av[1], TCP_DATA+1, SOCKOPT_READ|SOCKOPT_REUSE);
if (server < 0) {
perror("bytes_tcp: could not open socket to server");
exit(2);
}
transfer(get, server, buf);
close(server);
exit(0);
/*NOTREACHED*/
}
void
child()
{
wait(0);
signal(SIGCHLD, child);
}
/* ARGSUSED */
int
server_main(int ac, char **av)
{
int data, newdata;
signal(SIGCHLD, child);
data = tcp_server(TCP_DATA+1, SOCKOPT_READ|SOCKOPT_WRITE|SOCKOPT_REUSE);
for ( ;; ) {
newdata = tcp_accept(data, SOCKOPT_WRITE|SOCKOPT_READ);
switch (fork()) {
case -1:
perror("fork");
break;
case 0:
source(newdata);
exit(0);
default:
close(newdata);
break;
}
}
}
void
source(int data)
{
char buf[XFER];
while (write(data, buf, sizeof(buf)) > 0);
}
int
main(int ac, char **av)
{
char* usage = "Usage: %s -s OR %s -serverhost OR %s serverhost [msgsize]\n";
if (ac < 2 || 3 < ac) {
fprintf(stderr, usage, av[0], av[0], av[0]);
exit(1);
}
if (ac == 2 && !strcmp(av[1], "-s")) {
if (fork() == 0) server_main(ac, av);
exit(0);
} else {
client_main(ac, av);
}
return(0);
}
[-- Attachment #3: lib_tcp.c --]
[-- Type: text/x-csrc, Size: 5094 bytes --]
/*
* tcp_lib.c - routines for managing TCP connections.
*
* Positive port/program numbers are RPC ports, negative ones are TCP ports.
*
* Copyright (c) 1994-1996 Larry McVoy.
*/
#define _LIB /* bench.h needs this */
#include "bench.h"
/*
* Get a TCP socket, bind it, figure out the port,
* and advertise the port as program "prog".
*
* XXX - it would be nice if you could advertise ascii strings.
*/
int
tcp_server(int prog, int rdwr)
{
int sock;
struct sockaddr_in s;
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "tcp_server(%u, %u)\n", prog, rdwr);
#endif
if ((sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
perror("socket");
exit(1);
}
sock_optimize(sock, rdwr);
bzero((void*)&s, sizeof(s));
s.sin_family = AF_INET;
if (prog < 0) {
s.sin_port = htons(-prog);
}
if (bind(sock, (struct sockaddr*)&s, sizeof(s)) < 0) {
perror("bind");
exit(2);
}
if (listen(sock, 100) < 0) {
perror("listen");
exit(4);
}
if (prog > 0) {
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "Server port %d\n", sockport(sock));
#endif
(void)pmap_unset((u_long)prog, (u_long)1);
if (!pmap_set((u_long)prog, (u_long)1, (u_long)IPPROTO_TCP,
(unsigned short)sockport(sock))) {
perror("pmap_set");
exit(5);
}
}
return (sock);
}
/*
* Unadvertise the socket
*/
int
tcp_done(int prog)
{
if (prog > 0) {
pmap_unset((u_long)prog, (u_long)1);
}
return (0);
}
/*
* Accept a connection and return it
*/
int
tcp_accept(int sock, int rdwr)
{
struct sockaddr_in s;
int newsock, namelen;
namelen = sizeof(s);
bzero((void*)&s, namelen);
retry:
if ((newsock = accept(sock, (struct sockaddr*)&s, &namelen)) < 0) {
if (errno == EINTR)
goto retry;
perror("accept");
exit(6);
}
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "Server newsock port %d\n", sockport(newsock));
#endif
sock_optimize(newsock, rdwr);
return (newsock);
}
/*
* Connect to the TCP socket advertised as "prog" on "host" and
* return the connected socket.
*
* Hacked Thu Oct 27 1994 to cache pmap_getport calls. This saves
* about 4000 usecs in loopback lat_connect calls. I suppose we
* should time gethostbyname() & pmap_getprot(), huh?
*/
int
tcp_connect(char *host, int prog, int rdwr)
{
static struct hostent *h;
static struct sockaddr_in s;
static u_short save_port;
static u_long save_prog;
static char *save_host;
int sock;
static int tries = 0;
if ((sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
perror("socket");
exit(1);
}
if (rdwr & SOCKOPT_PID) {
static unsigned short port;
struct sockaddr_in sin;
if (!port) {
port = (unsigned short)(getpid() << 4);
if (port < 1024) {
port += 1024;
}
}
do {
port++;
bzero((void*)&sin, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_port = htons(port);
} while (bind(sock, (struct sockaddr*)&sin, sizeof(sin)) == -1);
}
#ifdef LIBTCP_VERBOSE
else {
struct sockaddr_in sin;
bzero((void*)&sin, sizeof(sin));
sin.sin_family = AF_INET;
if (bind(sock, (struct sockaddr*)&sin, sizeof(sin)) < 0) {
perror("bind");
exit(2);
}
}
fprintf(stderr, "Client port %d\n", sockport(sock));
#endif
sock_optimize(sock, rdwr);
if (!h || host != save_host || prog != save_prog) {
save_host = host; /* XXX - counting on them not
* changing it - benchmark only.
*/
save_prog = prog;
if (!(h = gethostbyname(host))) {
perror(host);
exit(2);
}
bzero((void *) &s, sizeof(s));
s.sin_family = AF_INET;
bcopy((void*)h->h_addr, (void *)&s.sin_addr, h->h_length);
if (prog > 0) {
save_port = pmap_getport(&s, prog,
(u_long)1, IPPROTO_TCP);
if (!save_port) {
perror("lib TCP: No port found");
exit(3);
}
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "Server port %d\n", save_port);
#endif
s.sin_port = htons(save_port);
} else {
s.sin_port = htons(-prog);
}
}
if (connect(sock, (struct sockaddr*)&s, sizeof(s)) < 0) {
if (errno == ECONNRESET || errno == ECONNREFUSED) {
close(sock);
if (++tries > 10) return(-1);
return (tcp_connect(host, prog, rdwr));
}
perror("connect");
exit(4);
}
tries = 0;
return (sock);
}
#define LIBTCP_VERBOSE
void
sock_optimize(int sock, int flags)
{
return;
if (flags & SOCKOPT_READ) {
int sockbuf = SOCKBUF;
while (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &sockbuf,
sizeof(int))) {
sockbuf >>= 1;
}
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "sockopt %d: RCV: %dK\n", sock, sockbuf>>10);
#endif
}
if (flags & SOCKOPT_WRITE) {
int sockbuf = SOCKBUF;
while (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &sockbuf,
sizeof(int))) {
sockbuf >>= 1;
}
#ifdef LIBTCP_VERBOSE
fprintf(stderr, "sockopt %d: SND: %dK\n", sock, sockbuf>>10);
#endif
}
if (flags & SOCKOPT_REUSE) {
int val = 1;
if (setsockopt(sock, SOL_SOCKET,
SO_REUSEADDR, &val, sizeof(val)) == -1) {
perror("SO_REUSEADDR");
}
}
}
int
sockport(int s)
{
int namelen;
struct sockaddr_in sin;
namelen = sizeof(sin);
if (getsockname(s, (struct sockaddr *)&sin, &namelen) < 0) {
perror("getsockname");
return(-1);
}
return ((int)ntohs(sin.sin_port));
}
next prev parent reply other threads:[~2007-10-02 4:23 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070929142517.EC6AB5FB21@work.bitmover.com>
[not found] ` <alpine.LFD.0.999.0709290914410.3579@woody.linux-foundation.org>
[not found] ` <20070929172639.GB7037@bitmover.com>
[not found] ` <alpine.LFD.0.999.0709291050200.3579@woody.linux-foundation.org>
2007-10-02 0:59 ` tcp bw in 2.6 Larry McVoy
2007-10-02 2:14 ` Linus Torvalds
2007-10-02 2:20 ` Larry McVoy
2007-10-02 3:50 ` David Miller
2007-10-02 4:23 ` Larry McVoy [this message]
2007-10-02 15:06 ` John Heffner
2007-10-02 17:14 ` Rick Jones
2007-10-02 17:20 ` Larry McVoy
2007-10-02 18:01 ` Rick Jones
2007-10-02 18:40 ` Larry McVoy
2007-10-02 19:47 ` Rick Jones
2007-10-02 21:32 ` David Miller
2007-10-03 7:19 ` Bill Fink
2007-10-02 10:52 ` Herbert Xu
2007-10-02 15:09 ` Larry McVoy
2007-10-02 15:41 ` Larry McVoy
2007-10-02 16:25 ` Larry McVoy
2007-10-02 16:47 ` Stephen Hemminger
2007-10-02 16:49 ` Larry McVoy
2007-10-02 17:10 ` Stephen Hemminger
2007-10-15 12:40 ` Daniel Schaffrath
2007-10-15 15:49 ` Stephen Hemminger
2007-10-02 16:34 ` Linus Torvalds
2007-10-02 16:48 ` Larry McVoy
2007-10-02 21:16 ` David Miller
2007-10-02 21:26 ` Larry McVoy
2007-10-02 21:47 ` David Miller
2007-10-02 22:17 ` Rick Jones
2007-10-02 22:32 ` David Miller
2007-10-02 22:36 ` Larry McVoy
2007-10-02 22:59 ` Rick Jones
2007-10-03 8:02 ` David Miller
2007-10-02 16:48 ` Ben Greear
2007-10-02 17:11 ` Larry McVoy
2007-10-02 17:18 ` Ben Greear
2007-10-02 17:21 ` Larry McVoy
2007-10-02 17:54 ` Stephen Hemminger
2007-10-02 18:35 ` Larry McVoy
2007-10-02 18:29 ` John Heffner
2007-10-02 19:07 ` Larry McVoy
2007-10-02 19:29 ` Linus Torvalds
2007-10-02 20:31 ` David Miller
2007-10-02 19:33 ` Larry McVoy
2007-10-02 19:53 ` John Heffner
2007-10-02 20:14 ` Larry McVoy
2007-10-02 20:40 ` Rick Jones
2007-10-02 20:42 ` Wayne Scott
2007-10-02 21:56 ` Linus Torvalds
2007-10-02 19:27 ` Linus Torvalds
2007-10-02 19:53 ` Rick Jones
2007-10-02 20:33 ` David Miller
2007-10-02 20:44 ` Roland Dreier
2007-10-02 21:21 ` Larry McVoy
2007-10-03 21:13 ` Pekka Pietikainen
2007-10-03 21:23 ` Larry McVoy
2007-10-03 21:50 ` Pekka Pietikainen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071002042300.GC5480@bitmover.com \
--to=lm@bitmover.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=wscott@bitmover.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).