From: torvalds@transmeta.com (Linus Torvalds)
To: linux-kernel@vger.kernel.org
Subject: Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Date: 9 Jan 2001 14:25:43 -0800 [thread overview]
Message-ID: <93g357$2jf$1@penguin.transmeta.com> (raw)
In-Reply-To: <20010109141806.F4284@redhat.com> <Pine.LNX.4.30.0101091532150.4368-100000@e2> <20010109151725.D9321@redhat.com>
In article <20010109151725.D9321@redhat.com>,
Stephen C. Tweedie <sct@redhat.com> wrote:
>
>Jes has also got hard numbers for the performance advantages of
>jumbograms on some of the networks he's been using, and you ain't
>going to get udp jumbograms through a page-by-page API, ever.
Wrong.
The only thing you need is a nagle-type thing that coalesces requests.
In the case of UDP, that coalescing obviously has to be explicitly
controlled, as the "standard" UDP behaviour is to send out just one
packet per write.
But this is a problem for TCP too: you want to tell TCP to _not_ send
out a short packet even if there are none in-flight, if you know you
want to send more. So you want to have some way to anti-nagle for TCP
anyway.
Also, if you look at the problem of "writev()", you'll notice that you
have many of the same issues: what you really want is to _always_
coalesce, and only send out when explicitly asked for (and then that
explicit ask would be on by default at the end of write() and at the
very end of the last segment in "writev()".
It so happens that this logic already exists, it's called MSG_MORE or
something similar (I'm too lazy to check the actual patches).
And it's there exactly because it is stupid to make the upper layers
have to gather everything into one packet if the lower layers need that
logic for other reasons anyway. Which they obviously do.
So what you can do is to just do multiple writes, and set the MSG_MORE
flag. This works with sendfile(), but more importantly it is also an
uncommonly good interface to user mode. With this, you can actually
implement things like "writev()" _properly_ from user-space, and we
could get rid of the special socket writev() magic if we wanted to.
So if you have a header, you just send out that header separately (with
the MSG_MORE flag), and then do a "sendfile()" or whatever to send out
the data.
This is much more flexible than writev(), and a lot easier to use. It's
also a hell of a lot more flexible than the ugly sendfile() interfaces
that HP-UX and the BSD people have - I'm ashamed of how little taste the
BSD group in general has had in interface design. Ugh. Tacking on a
mixture of writev() and sendfile() in the same system call. Tacky.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2001-01-09 22:26 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-01-08 1:24 [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1 David S. Miller
2001-01-08 10:39 ` Christoph Hellwig
2001-01-08 10:34 ` David S. Miller
2001-01-08 18:05 ` Rik van Riel
2001-01-08 21:07 ` David S. Miller
2001-01-09 10:23 ` Ingo Molnar
2001-01-09 10:31 ` Christoph Hellwig
2001-01-09 10:31 ` David S. Miller
2001-01-09 11:28 ` Christoph Hellwig
2001-01-09 11:42 ` David S. Miller
2001-01-09 12:04 ` Ingo Molnar
2001-01-09 14:25 ` Stephen C. Tweedie
2001-01-09 14:33 ` Alan Cox
2001-01-09 15:00 ` Ingo Molnar
2001-01-09 15:27 ` Stephen C. Tweedie
2001-01-09 16:16 ` Ingo Molnar
2001-01-09 16:37 ` Alan Cox
2001-01-09 16:48 ` Ingo Molnar
2001-01-09 17:29 ` Alan Cox
2001-01-09 17:38 ` Jens Axboe
2001-01-09 18:38 ` Ingo Molnar
2001-01-09 19:54 ` Andrea Arcangeli
2001-01-09 20:10 ` Ingo Molnar
2001-01-10 0:00 ` Andrea Arcangeli
2001-01-09 20:12 ` Jens Axboe
2001-01-09 23:20 ` Andrea Arcangeli
2001-01-09 23:34 ` Jens Axboe
2001-01-09 23:52 ` Andrea Arcangeli
2001-01-17 5:16 ` Rik van Riel
2001-01-09 17:56 ` Chris Evans
2001-01-09 18:41 ` Ingo Molnar
2001-01-09 22:58 ` [patch]: ac4 blk (was Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1) Jens Axboe
2001-01-09 19:20 ` [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1 J Sloan
2001-01-09 18:10 ` Stephen C. Tweedie
2001-01-09 15:38 ` Benjamin C.R. LaHaise
2001-01-09 16:40 ` Ingo Molnar
2001-01-09 17:30 ` Benjamin C.R. LaHaise
2001-01-09 18:12 ` Stephen C. Tweedie
2001-01-09 18:35 ` Ingo Molnar
2001-01-09 17:53 ` Christoph Hellwig
2001-01-09 21:13 ` David S. Miller
2001-01-09 19:14 ` Linus Torvalds
2001-01-09 20:07 ` Ingo Molnar
2001-01-09 20:15 ` Linus Torvalds
2001-01-09 20:36 ` Christoph Hellwig
2001-01-09 20:55 ` Linus Torvalds
2001-01-09 21:12 ` Christoph Hellwig
2001-01-09 21:26 ` Linus Torvalds
2001-01-10 7:42 ` Christoph Hellwig
2001-01-10 8:05 ` Linus Torvalds
2001-01-10 8:33 ` Christoph Hellwig
2001-01-10 8:37 ` Andrew Morton
2001-01-10 23:32 ` Linus Torvalds
2001-01-19 15:55 ` Andrew Scott
2001-01-17 14:05 ` Rik van Riel
2001-01-18 0:53 ` Christoph Hellwig
2001-01-18 1:13 ` Linus Torvalds
2001-01-18 17:50 ` Christoph Hellwig
2001-01-18 18:04 ` Linus Torvalds
2001-01-18 21:12 ` Albert D. Cahalan
2001-01-19 1:52 ` 2.4.1-pre8 video/ohci1394 compile problem ebi4
2001-01-19 6:55 ` [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1 Linus Torvalds
2001-01-09 23:06 ` Benjamin C.R. LaHaise
2001-01-09 23:54 ` Linus Torvalds
2001-01-10 7:51 ` Gerd Knorr
2001-01-12 1:42 ` Stephen C. Tweedie
2001-01-09 11:05 ` Ingo Molnar
2001-01-09 18:27 ` Christoph Hellwig
2001-01-09 19:19 ` Ingo Molnar
2001-01-09 14:18 ` Stephen C. Tweedie
2001-01-09 14:40 ` Ingo Molnar
2001-01-09 14:51 ` Alan Cox
2001-01-09 15:17 ` Stephen C. Tweedie
2001-01-09 15:37 ` Ingo Molnar
2001-01-09 21:18 ` David S. Miller
2001-01-09 22:25 ` Linus Torvalds [this message]
2001-01-10 15:21 ` Stephen C. Tweedie
2001-01-09 15:25 ` Stephen Frost
2001-01-09 15:40 ` Ingo Molnar
2001-01-09 15:48 ` Stephen Frost
2001-01-10 1:14 ` Dave Zarzycki
2001-01-10 1:14 ` David S. Miller
2001-01-10 2:18 ` Dave Zarzycki
2001-01-10 1:19 ` Ingo Molnar
2001-01-10 2:56 ` storage over IP (was Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1) dean gaudet
2001-01-10 2:58 ` David S. Miller
2001-01-10 3:18 ` dean gaudet
2001-01-10 3:09 ` David S. Miller
2001-01-10 3:05 ` storage over IP (was Re: [PLEASE-TESTME] Zerocopy networking patch, Alan Cox
2001-01-08 21:56 ` [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1 Jes Sorensen
2001-01-08 21:48 ` David S. Miller
2001-01-08 22:32 ` Jes Sorensen
2001-01-08 22:36 ` David S. Miller
2001-01-09 12:12 ` Ingo Molnar
2001-01-08 22:43 ` Stephen Frost
2001-01-08 22:37 ` David S. Miller
2001-01-09 13:52 ` Trond Myklebust
2001-01-09 13:42 ` David S. Miller
2001-01-09 15:27 ` Trond Myklebust
2001-01-09 21:19 ` David S. Miller
2001-01-10 9:21 ` Trond Myklebust
-- strict thread matches above, loose matches on Subject: below --
2001-01-09 13:08 Stephen Landamore
2001-01-09 13:24 ` Ingo Molnar
2001-01-09 13:47 ` Andrew Morton
2001-01-09 19:15 ` Dan Hollis
2001-01-09 19:14 ` Dan Hollis
2001-01-09 22:03 ` David S. Miller
2001-01-09 22:58 ` Dan Hollis
2001-01-09 22:59 ` Ingo Molnar
2001-01-09 23:11 ` Dan Hollis
2001-01-10 3:24 ` Chris Wedgwood
2001-01-09 17:46 Manfred Spraul
2001-01-10 8:41 Manfred Spraul
2001-01-10 8:31 ` David S. Miller
2001-01-10 11:25 ` Ingo Molnar
2001-01-10 12:03 ` Manfred Spraul
2001-01-10 12:07 ` Ingo Molnar
2001-01-10 16:18 ` Jamie Lokier
2001-01-13 15:43 ` yodaiken
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='93g357$2jf$1@penguin.transmeta.com' \
--to=torvalds@transmeta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox