From: Jens Axboe <axboe@suse.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Ingo Molnar <mingo@elte.hu>, linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH] splice support #2
Date: Fri, 31 Mar 2006 11:56:30 +0200 [thread overview]
Message-ID: <20060331095629.GJ14022@suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0603300853190.27203@g5.osdl.org>
On Thu, Mar 30 2006, Linus Torvalds wrote:
>
>
> On Thu, 30 Mar 2006, Jens Axboe wrote:
> > On Thu, Mar 30 2006, Ingo Molnar wrote:
> > >
> > > neat stuff. One question: why do we require fdin or fdout to be a pipe?
> > > Is there any fundamental problem with implementing what Larry's original
> > > paper described too: straight pagecache -> socket transfers? Without a
> > > pipe intermediary forced inbetween. It only adds unnecessary overhead.
> >
> > No, not a fundamental problem. I think I even hid that in some comment
> > in there, at least if it's decipharable by someone else than myself...
>
> Actually, there _is_ a fundamental problem. Two of them, in fact.
>
> The reason it goes through a pipe is two-fold:
>
> - the pipe _is_ the buffer. The reason sendfile() sucks is that sendfile
> cannot work with <n> different buffer representations. sendfile() only
> works with _one_ buffer representation, namely the "page cache of the
> file".
>
> By using the page cache directly, sendfile() doesn't need any extra
> buffering, but that's also why sendfile() fundamentally _cannot_ work
> with anything else. You cannot do "sendfile" between two sockets to
> forward data from one place to another, for example. You cannot do
> sendfile from a streaming device.
>
> The pipe is just the standard in-kernel buffer between two arbitrary
> points. Think of it as a scatter-gather list with a wait-queue. That's
> what a pipe _is_. Trying to get rid of the pipe totally misses the
> whole point of splice().
>
> Now, we could have a splice call that has an _implicit_ pipe, ie if
> neither side is a pipe, we could create a temporary pipe and thus
> allow what looks like a direct splice. But the pipe should still be
> there.
>
> - The pipe is the buffer #2: it's what allows you to do _other_ things
> with splice that are simply impossible to do with sendfile. Notably,
> splice allows very naturally the "readv/writev" scatter-gather
> behaviour of _mixing_ streams. If you're a web-server, with splice you
> can do
>
> write(pipefd, header, header_len);
> splice(file, pipefd, file_len);
> splice(pipefd, socket, total_len);
>
> (this is all conceptual pseudo-code, of course), and this very
> naturally has none of the issues that sendfile() has with plugging etc.
> There's never any "send header separately and do extra work to make
> sure it is in the same packet as the start of the data".
>
> So having a separate buffer even when you _do_ have a buffer like the
> page cache is still something you want to do.
>
> So there.
My point was mainly that the buffer itself need not necessarily be a
pipe, it could be implemented with a pipe just using the same buffer
type. But I guess it doesn't make much sense, the pipe has nice
advantages in itself.
--
Jens Axboe
next prev parent reply other threads:[~2006-03-31 9:56 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-30 10:06 [PATCH] splice support #2 Jens Axboe
2006-03-30 10:16 ` Andrew Morton
2006-03-30 10:24 ` Jens Axboe
2006-03-30 11:16 ` Andrew Morton
2006-03-30 11:55 ` Jens Axboe
2006-03-30 12:30 ` Jens Axboe
2006-03-30 12:30 ` Jens Axboe
2006-03-30 19:19 ` Andrew Morton
2006-03-30 12:00 ` Ingo Molnar
2006-03-30 12:05 ` Jens Axboe
2006-03-30 12:10 ` Ingo Molnar
2006-03-30 12:16 ` Jens Axboe
2006-03-30 12:38 ` Ingo Molnar
2006-03-30 12:42 ` Jens Axboe
2006-03-30 12:42 ` Ingo Molnar
2006-03-30 13:02 ` Jens Axboe
2006-03-30 14:20 ` Christoph Hellwig
2006-03-30 17:02 ` Linus Torvalds
2006-03-30 17:17 ` Linus Torvalds
2006-03-31 20:38 ` Hua Zhong
2006-03-31 20:49 ` Linus Torvalds
2006-03-30 20:48 ` Jeff Garzik
2006-03-30 21:16 ` Linus Torvalds
2006-03-31 0:59 ` Nick Piggin
2006-03-31 2:43 ` Andrew Morton
2006-03-31 2:51 ` Andrew Morton
2006-03-31 3:20 ` Nick Piggin
2006-03-31 6:35 ` Christoph Hellwig
2006-03-31 7:09 ` Ingo Molnar
2006-04-02 22:33 ` Pavel Machek
2006-03-31 12:46 ` Bernd Petrovitsch
2006-03-31 9:56 ` Jens Axboe [this message]
2006-03-31 12:18 ` Ingo Molnar
2006-03-31 12:23 ` Jens Axboe
2006-03-31 12:26 ` Jens Axboe
2006-03-31 12:47 ` Ingo Molnar
2006-03-31 18:18 ` Jens Axboe
2006-03-31 12:27 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2006-03-31 0:03 linux
2006-03-31 6:06 tridge
2006-03-31 6:59 ` Antonio Vargas
2006-03-31 7:37 ` tridge
2006-03-31 9:57 ` Jens Axboe
2006-03-31 19:11 ` Linus Torvalds
2006-03-31 19:40 ` Jens Axboe
2006-04-04 17:16 ` Andy Lutomirski
2006-04-04 17:34 ` Jens Axboe
[not found] <5W2gv-Tp-19@gated-at.bofh.it>
[not found] ` <5W48C-3KW-17@gated-at.bofh.it>
[not found] ` <5W48D-3KW-21@gated-at.bofh.it>
[not found] ` <5W8OT-2ms-17@gated-at.bofh.it>
[not found] ` <5WcfS-7x9-23@gated-at.bofh.it>
[not found] ` <5WcIT-8nr-13@gated-at.bofh.it>
[not found] ` <5Wm5I-53z-7@gated-at.bofh.it>
[not found] ` <5XjoS-8t9-11@gated-at.bofh.it>
2006-04-03 12:39 ` Bodo Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060331095629.GJ14022@suse.de \
--to=axboe@suse.de \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).