From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jeff Garzik <jeff@garzik.org>, Jens Axboe <axboe@suse.de>,
linux-kernel@vger.kernel.org, akpm@osdl.org,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] splice support #2
Date: Fri, 31 Mar 2006 09:09:32 +0200 [thread overview]
Message-ID: <20060331070931.GA25853@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0603301259220.27203@g5.osdl.org>
* Linus Torvalds <torvalds@osdl.org> wrote:
> In particular, what happens when you try to connect two streaming
> devices, but the destination stops accepting data? You cannot put the
> received data "back" into the streaming source any way - so if you
> actually want to be able to handle error recovery, you _have_ to get
> access to the source buffers.
i'd rather implement this error case as an exception mechanism, instead
of a forced intermediary buffer mechanism.
We should extend the userspace API so that it is prepared to receive
'excess data' via a separate 'flush excess data to' file descriptor:
sys_splice(fd_in, fd_out, fd_flush, size,
max_flush_size, *bytes_flushed)
Note1: fd_flush can be a pipe too! This would avoid copies in the
exception case - if the exception case is expected to be common.
Note2: max_flush_size serves as hint and as a natural 'buffering limit'
for the kernel-internal loops. I believe it's more natural than
the implicit 'pipe buffering limit' we currently have.
max_flush_size == 0 would say to the kernel: 'use whatever
buffering is natural or necessary'. E.g. if fd_flush is a pipe,
it would automatically set the buffering size to the flush-pipe's
internal buffering limit.
Note3: we could even eliminate the "*bytes_flushed" parameter from
the syscall: as fd_flush's seek offset gives userspace an idea
about how much data was written to it.
Note4: if the user messes up fd_flush so that the kernel's "excess data"
transfer into fd_flush failes then that's 'tough luck' and flush
data may be lost. Users can use pipes [if the exception case is
common and they want to optimize that codepath] or can pre-write
their files if they need a 100% guarantee.
In fact, the kernel doesnt even have to _look up_ fd_flush in the
common case. It's the application's responsibility to make sure
the exception case will work. This means that the _only_ overhead
from this exception mechanism are the 2-3 extra parameters to
sys_splice(). That's _much_ faster.
Just look at the beauty of this generalization. fd_flush can be
_anything_. It could be a pipe. It could be a temporary file in /tmp. It
could be a file over the network. fd_flush could be mmap()-ed to
user-space! Or it could even be -1 if the user is not interested in the
error case for the streaming data. (For example a good portion of video
and audio playback applications are not interested in the fd_out error
case at all: such data can easily lose 'value' if it gets delayed by
more than a few milliseconds and the right answer is to skip the frame
or display an error message, ignoring the lost data.)
But for heaven's sake: do not slow down the 99.9999999999% fastpath by
forcing a pipe inbetween on the ABI level! I really have nothing against
making sys_splice() generic and i agree that a very good first step to
achieve that is to include pipes in the implementation, but i dont think
pipes are (or should be) all that critical and fundamental to the splice
data-streaming concept itself, as you are suggesting.
> Also, for signal handling, you need to have some way to keep the pipe
> around for several iterations on the sender side, while still
> returning to user space to do the signal handler.
i believe the signal case is naturally handled by the fd_flush approach
too - in fact it can also acts as a nice tester for the exception
handling mechanism.
If the application in question expects to get many signals then it can
use a pipe as fd_flush. (But signal-heavy apps are quite rare: most
performance-critical apps avoid them for the fastpath like the plague,
on modern CPUs it's more expensive to receive and handle a single signal
than to create and tear down a completely new thread (!))
Ingo
next prev parent reply other threads:[~2006-03-31 7:12 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-30 10:06 [PATCH] splice support #2 Jens Axboe
2006-03-30 10:16 ` Andrew Morton
2006-03-30 10:24 ` Jens Axboe
2006-03-30 11:16 ` Andrew Morton
2006-03-30 11:55 ` Jens Axboe
2006-03-30 12:30 ` Jens Axboe
2006-03-30 12:30 ` Jens Axboe
2006-03-30 19:19 ` Andrew Morton
2006-03-30 12:00 ` Ingo Molnar
2006-03-30 12:05 ` Jens Axboe
2006-03-30 12:10 ` Ingo Molnar
2006-03-30 12:16 ` Jens Axboe
2006-03-30 12:38 ` Ingo Molnar
2006-03-30 12:42 ` Jens Axboe
2006-03-30 12:42 ` Ingo Molnar
2006-03-30 13:02 ` Jens Axboe
2006-03-30 14:20 ` Christoph Hellwig
2006-03-30 17:02 ` Linus Torvalds
2006-03-30 17:17 ` Linus Torvalds
2006-03-31 20:38 ` Hua Zhong
2006-03-31 20:49 ` Linus Torvalds
2006-03-30 20:48 ` Jeff Garzik
2006-03-30 21:16 ` Linus Torvalds
2006-03-31 0:59 ` Nick Piggin
2006-03-31 2:43 ` Andrew Morton
2006-03-31 2:51 ` Andrew Morton
2006-03-31 3:20 ` Nick Piggin
2006-03-31 6:35 ` Christoph Hellwig
2006-03-31 7:09 ` Ingo Molnar [this message]
2006-04-02 22:33 ` Pavel Machek
2006-03-31 12:46 ` Bernd Petrovitsch
2006-03-31 9:56 ` Jens Axboe
2006-03-31 12:18 ` Ingo Molnar
2006-03-31 12:23 ` Jens Axboe
2006-03-31 12:26 ` Jens Axboe
2006-03-31 12:47 ` Ingo Molnar
2006-03-31 18:18 ` Jens Axboe
2006-03-31 12:27 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2006-03-31 0:03 linux
2006-03-31 6:06 tridge
2006-03-31 6:59 ` Antonio Vargas
2006-03-31 7:37 ` tridge
2006-03-31 9:57 ` Jens Axboe
2006-03-31 19:11 ` Linus Torvalds
2006-03-31 19:40 ` Jens Axboe
2006-04-04 17:16 ` Andy Lutomirski
2006-04-04 17:34 ` Jens Axboe
[not found] <5W2gv-Tp-19@gated-at.bofh.it>
[not found] ` <5W48C-3KW-17@gated-at.bofh.it>
[not found] ` <5W48D-3KW-21@gated-at.bofh.it>
[not found] ` <5W8OT-2ms-17@gated-at.bofh.it>
[not found] ` <5WcfS-7x9-23@gated-at.bofh.it>
[not found] ` <5WcIT-8nr-13@gated-at.bofh.it>
[not found] ` <5Wm5I-53z-7@gated-at.bofh.it>
[not found] ` <5XjoS-8t9-11@gated-at.bofh.it>
2006-04-03 12:39 ` Bodo Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060331070931.GA25853@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@osdl.org \
--cc=axboe@suse.de \
--cc=hch@infradead.org \
--cc=jeff@garzik.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.