All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jeff Garzik <jeff@garzik.org>, Jens Axboe <axboe@suse.de>,
	linux-kernel@vger.kernel.org, akpm@osdl.org,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] splice support #2
Date: Fri, 31 Mar 2006 09:09:32 +0200	[thread overview]
Message-ID: <20060331070931.GA25853@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0603301259220.27203@g5.osdl.org>


* Linus Torvalds <torvalds@osdl.org> wrote:

> In particular, what happens when you try to connect two streaming 
> devices, but the destination stops accepting data? You cannot put the 
> received data "back" into the streaming source any way - so if you 
> actually want to be able to handle error recovery, you _have_ to get 
> access to the source buffers.

i'd rather implement this error case as an exception mechanism, instead 
of a forced intermediary buffer mechanism.

We should extend the userspace API so that it is prepared to receive 
'excess data' via a separate 'flush excess data to' file descriptor:

	sys_splice(fd_in, fd_out, fd_flush, size,
                   max_flush_size, *bytes_flushed)

Note1: fd_flush can be a pipe too! This would avoid copies in the 
       exception case - if the exception case is expected to be common.

Note2: max_flush_size serves as hint and as a natural 'buffering limit' 
       for the kernel-internal loops. I believe it's more natural than 
       the implicit 'pipe buffering limit' we currently have.
       max_flush_size == 0 would say to the kernel: 'use whatever 
       buffering is natural or necessary'. E.g. if fd_flush is a pipe, 
       it would automatically set the buffering size to the flush-pipe's
       internal buffering limit.

Note3: we could even eliminate the "*bytes_flushed" parameter from 
       the syscall: as fd_flush's seek offset gives userspace an idea 
       about how much data was written to it.

Note4: if the user messes up fd_flush so that the kernel's "excess data" 
       transfer into fd_flush failes then that's 'tough luck' and flush 
       data may be lost. Users can use pipes [if the exception case is 
       common and they want to optimize that codepath] or can pre-write 
       their files if they need a 100% guarantee.
       In fact, the kernel doesnt even have to _look up_ fd_flush in the 
       common case. It's the application's responsibility to make sure 
       the exception case will work. This means that the _only_ overhead 
       from this exception mechanism are the 2-3 extra parameters to
       sys_splice(). That's _much_ faster.

Just look at the beauty of this generalization. fd_flush can be 
_anything_. It could be a pipe. It could be a temporary file in /tmp. It 
could be a file over the network. fd_flush could be mmap()-ed to 
user-space! Or it could even be -1 if the user is not interested in the 
error case for the streaming data. (For example a good portion of video 
and audio playback applications are not interested in the fd_out error 
case at all: such data can easily lose 'value' if it gets delayed by 
more than a few milliseconds and the right answer is to skip the frame 
or display an error message, ignoring the lost data.)

But for heaven's sake: do not slow down the 99.9999999999% fastpath by 
forcing a pipe inbetween on the ABI level! I really have nothing against 
making sys_splice() generic and i agree that a very good first step to 
achieve that is to include pipes in the implementation, but i dont think 
pipes are (or should be) all that critical and fundamental to the splice 
data-streaming concept itself, as you are suggesting.

> Also, for signal handling, you need to have some way to keep the pipe 
> around for several iterations on the sender side, while still 
> returning to user space to do the signal handler.

i believe the signal case is naturally handled by the fd_flush approach 
too - in fact it can also acts as a nice tester for the exception 
handling mechanism.

If the application in question expects to get many signals then it can 
use a pipe as fd_flush. (But signal-heavy apps are quite rare: most 
performance-critical apps avoid them for the fastpath like the plague, 
on modern CPUs it's more expensive to receive and handle a single signal 
than to create and tear down a completely new thread (!))

	Ingo

  parent reply	other threads:[~2006-03-31  7:12 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-30 10:06 [PATCH] splice support #2 Jens Axboe
2006-03-30 10:16 ` Andrew Morton
2006-03-30 10:24   ` Jens Axboe
2006-03-30 11:16     ` Andrew Morton
2006-03-30 11:55       ` Jens Axboe
2006-03-30 12:30         ` Jens Axboe
2006-03-30 12:30           ` Jens Axboe
2006-03-30 19:19           ` Andrew Morton
2006-03-30 12:00 ` Ingo Molnar
2006-03-30 12:05   ` Jens Axboe
2006-03-30 12:10     ` Ingo Molnar
2006-03-30 12:16       ` Jens Axboe
2006-03-30 12:38         ` Ingo Molnar
2006-03-30 12:42           ` Jens Axboe
2006-03-30 12:42             ` Ingo Molnar
2006-03-30 13:02               ` Jens Axboe
2006-03-30 14:20           ` Christoph Hellwig
2006-03-30 17:02     ` Linus Torvalds
2006-03-30 17:17       ` Linus Torvalds
2006-03-31 20:38         ` Hua Zhong
2006-03-31 20:49           ` Linus Torvalds
2006-03-30 20:48       ` Jeff Garzik
2006-03-30 21:16         ` Linus Torvalds
2006-03-31  0:59           ` Nick Piggin
2006-03-31  2:43             ` Andrew Morton
2006-03-31  2:51               ` Andrew Morton
2006-03-31  3:20                 ` Nick Piggin
2006-03-31  6:35                 ` Christoph Hellwig
2006-03-31  7:09           ` Ingo Molnar [this message]
2006-04-02 22:33             ` Pavel Machek
2006-03-31 12:46           ` Bernd Petrovitsch
2006-03-31  9:56       ` Jens Axboe
2006-03-31 12:18       ` Ingo Molnar
2006-03-31 12:23         ` Jens Axboe
2006-03-31 12:26           ` Jens Axboe
2006-03-31 12:47             ` Ingo Molnar
2006-03-31 18:18               ` Jens Axboe
2006-03-31 12:27       ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2006-03-31  0:03 linux
2006-03-31  6:06 tridge
2006-03-31  6:59 ` Antonio Vargas
2006-03-31  7:37   ` tridge
2006-03-31  9:57 ` Jens Axboe
2006-03-31 19:11   ` Linus Torvalds
2006-03-31 19:40     ` Jens Axboe
2006-04-04 17:16       ` Andy Lutomirski
2006-04-04 17:34         ` Jens Axboe
     [not found] <5W2gv-Tp-19@gated-at.bofh.it>
     [not found] ` <5W48C-3KW-17@gated-at.bofh.it>
     [not found]   ` <5W48D-3KW-21@gated-at.bofh.it>
     [not found]     ` <5W8OT-2ms-17@gated-at.bofh.it>
     [not found]       ` <5WcfS-7x9-23@gated-at.bofh.it>
     [not found]         ` <5WcIT-8nr-13@gated-at.bofh.it>
     [not found]           ` <5Wm5I-53z-7@gated-at.bofh.it>
     [not found]             ` <5XjoS-8t9-11@gated-at.bofh.it>
2006-04-03 12:39               ` Bodo Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060331070931.GA25853@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=hch@infradead.org \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.