From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>, CAI Qian <caiqian@redhat.com>,
linux-xfs <linux-xfs@vger.kernel.org>,
xfs@oss.sgi.com, Jens Axboe <axboe@kernel.dk>,
Nick Piggin <npiggin@gmail.com>,
linux-fsdevel@vger.kernel.org
Subject: [RFC][CFT] splice_read reworked
Date: Fri, 23 Sep 2016 20:00:32 +0100 [thread overview]
Message-ID: <20160923190032.GA25771@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20160917190023.GA8039@ZenIV.linux.org.uk>
The series is supposed to solve the locking order problems for
->splice_read() and get rid of code duplication between the read-side
methods.
pipe_lock is lifted out of ->splice_read() instances, along with
waiting for empty space in pipe, etc. - we do that stuff in callers.
A new variant of iov_iter is introduced - it's backed by a pipe,
copy_to_iter() results in allocating pages and copying into those,
copy_page_to_iter() just sticks a reference to that page into pipe.
Running out of space in pipe yields a short read, as a fault in iovec-backed
iov_iter would have. Enough primitives are implemented for normal
->read_iter() instances to work.
generic_file_splice_read() switched to feeding such iov_iter to
->read_iter() instance. That turns out to be enough to kill almost all
->splice_read() instances; the only ones _not_ using generic_file_splice_read()
or default_file_splice_read() (== no zero-copy fallback) are
fuse_dev_splice_read(), 3 instances in kernel/{relay.c,trace/trace.c} and
sock_splice_read(). It's almost certainly possible to convert fuse one
and the same might be possible to do to socket one. relay and tracing
stuff is just plain weird; might or might not be doable.
Something hopefully working is in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.splice_read
Several commits in that pipe (#1, #8 and #9) are trivial cleanups and fixes
for crap caught while doing the rest, probably ought to be separated.
Shortlog:
Al Viro (11):
fix memory leaks in tracing_buffers_splice_read()
splice_to_pipe(): don't open-code wakeup_pipe_readers()
splice: switch get_iovec_page_array() to iov_iter
splice: lift pipe_lock out of splice_to_pipe()
skb_splice_bits(): get rid of callback
new helper: add_to_pipe()
fuse_dev_splice_read(): switch to add_to_pipe()
cifs: don't use memcpy() to copy struct iov_iter
fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()
new iov_iter flavour: pipe-backed
switch generic_file_splice_read() to use of ->read_iter()
Diffstat:
drivers/staging/lustre/lustre/llite/file.c | 70 +--
.../staging/lustre/lustre/llite/llite_internal.h | 15 +-
drivers/staging/lustre/lustre/llite/vvp_internal.h | 14 -
drivers/staging/lustre/lustre/llite/vvp_io.c | 45 +-
fs/cifs/file.c | 14 +-
fs/coda/file.c | 23 +-
fs/fuse/dev.c | 48 +-
fs/fuse/file.c | 30 +-
fs/gfs2/file.c | 28 +-
fs/nfs/file.c | 25 +-
fs/nfs/internal.h | 2 -
fs/nfs/nfs4file.c | 2 +-
fs/ocfs2/file.c | 34 +-
fs/ocfs2/ocfs2_trace.h | 2 -
fs/splice.c | 578 +++++++--------------
fs/xfs/xfs_file.c | 41 +-
fs/xfs/xfs_trace.h | 1 -
include/linux/fs.h | 2 -
include/linux/skbuff.h | 8 +-
include/linux/splice.h | 3 +
include/linux/uio.h | 14 +-
kernel/trace/trace.c | 14 +-
lib/iov_iter.c | 390 +++++++++++++-
mm/shmem.c | 115 +---
net/core/skbuff.c | 28 +-
net/ipv4/tcp.c | 3 +-
net/kcm/kcmsock.c | 16 +-
net/unix/af_unix.c | 17 +-
28 files changed, 648 insertions(+), 934 deletions(-)
It's not all I would like to do there (in particular, I hadn't
done fuse splice_read conversion to read_iter, even though it does appear
to be doable; that'll take copy_page_to_iter_nosteal() as a new primitive
+ considerable amount of massage in fs/fuse/dev.c), but it should at least
* make pipe lock the outermost
* switch generic_file_splice_read() to ->read_iter(), making
it suitable for lustre/coda/gfs2/ocfs2/xfs/shmem without any wrappers
* somewhat simplify socket ->splice_read() guts (not by much - to
start doing that right we'd need the same new primitive)
* remove a considerable pile of code.
* get rid of a bunch of splice_{grow,shrink}_spd/splice_to_pipe
callers; remaining ones are in default_file_splice_read() (trivially
killable by conversion to iov_iter_get_pages_alloc(), followed by the same
build iovec array + use kernel_readv as we do now + iov_iter_advance to
the length returned by kernel_readv), kernel/relay and kernel/trace/trace.c
ones (should switch to add_to_pipe(), AFAICS) and skb_splice_bits()
(again, a matter of copy_page_to_iter_nosteal(), which will take out
spd_can_coalesce/spd_fill_page in there as well). Once the remaining ones
are taken care of, splice_pipe_desc and friends will go away.
In its current form it survives LTP, xfstests and overlayfs testsuite;
if anybody has additional tests for splice and friends, I would like to hear
about such. It really needs more beating, though.
Please, help with review and testing.
next prev parent reply other threads:[~2016-09-23 19:00 UTC|newest]
Thread overview: 151+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <723420070.1340881.1472835555274.JavaMail.zimbra@redhat.com>
[not found] ` <1832555471.1341372.1472835736236.JavaMail.zimbra@redhat.com>
2016-09-03 0:39 ` xfs_file_splice_read: possible circular locking dependency detected Dave Chinner
2016-09-03 0:57 ` Linus Torvalds
2016-09-03 1:45 ` Al Viro
2016-09-06 23:59 ` Dave Chinner
2016-09-08 20:35 ` Al Viro
2016-09-06 21:53 ` CAI Qian
2016-09-06 23:34 ` Dave Chinner
2016-09-08 15:29 ` CAI Qian
2016-09-08 17:56 ` Al Viro
2016-09-08 18:12 ` Linus Torvalds
2016-09-08 18:18 ` Linus Torvalds
2016-09-08 20:44 ` Al Viro
2016-09-08 20:57 ` Al Viro
2016-09-08 21:23 ` Al Viro
2016-09-08 21:38 ` Dave Chinner
2016-09-08 23:55 ` Al Viro
2016-09-09 1:53 ` Dave Chinner
2016-09-09 2:22 ` Linus Torvalds
2016-09-09 2:26 ` Linus Torvalds
2016-09-09 2:34 ` Al Viro
2016-09-09 2:50 ` Linus Torvalds
2016-09-09 22:19 ` Al Viro
2016-09-10 2:06 ` Linus Torvalds
2016-09-14 3:16 ` Al Viro
2016-09-14 3:39 ` Nicholas Piggin
2016-09-14 4:01 ` Linus Torvalds
2016-09-18 5:33 ` Al Viro
2016-09-19 3:08 ` Nicholas Piggin
2016-09-19 6:11 ` Al Viro
2016-09-19 7:26 ` Nicholas Piggin
2016-09-14 3:49 ` Linus Torvalds
2016-09-14 4:26 ` Al Viro
2016-09-17 8:20 ` Al Viro
2016-09-17 19:00 ` Al Viro
2016-09-17 20:15 ` Linus Torvalds
2016-09-18 19:31 ` skb_splice_bits() and large chunks in pipe (was " Al Viro
2016-09-18 20:12 ` Linus Torvalds
2016-09-18 22:31 ` Al Viro
2016-09-19 0:18 ` Linus Torvalds
2016-09-19 0:22 ` Al Viro
2016-09-20 9:51 ` Herbert Xu
2016-09-23 19:00 ` Al Viro [this message]
2016-09-23 19:01 ` [PATCH 01/11] fix memory leaks in tracing_buffers_splice_read() Al Viro
2016-09-23 19:02 ` [PATCH 02/11] splice_to_pipe(): don't open-code wakeup_pipe_readers() Al Viro
2016-09-23 19:02 ` [PATCH 03/11] splice: switch get_iovec_page_array() to iov_iter Al Viro
2016-09-23 19:02 ` Al Viro
2016-09-23 19:03 ` [PATCH 04/11] splice: lift pipe_lock out of splice_to_pipe() Al Viro
2016-09-23 19:45 ` Linus Torvalds
2016-09-23 20:10 ` Al Viro
2016-09-23 20:36 ` Linus Torvalds
2016-09-24 3:59 ` Al Viro
2016-09-24 17:29 ` Al Viro
2016-09-27 15:38 ` Nicholas Piggin
2016-09-27 15:53 ` Chuck Lever
2016-09-27 15:53 ` Chuck Lever
2016-09-24 3:59 ` [PATCH 04/12] " Al Viro
2016-09-26 13:35 ` Miklos Szeredi
2016-09-26 13:35 ` Miklos Szeredi
2016-09-27 4:14 ` Al Viro
2016-09-27 4:14 ` Al Viro
2016-12-17 19:54 ` Andreas Schwab
2016-12-18 19:28 ` Linus Torvalds
2016-12-18 19:57 ` Andreas Schwab
2016-12-18 20:12 ` Al Viro
2016-12-18 20:30 ` Al Viro
2016-12-18 22:10 ` Linus Torvalds
2016-12-18 22:18 ` Al Viro
2016-12-18 22:22 ` Linus Torvalds
2016-12-18 22:49 ` Andreas Schwab
2016-12-21 18:56 ` Andreas Schwab
2016-12-21 19:12 ` Linus Torvalds
2016-09-24 4:00 ` [PATCH 06/12] new helper: add_to_pipe() Al Viro
2016-09-26 13:49 ` Miklos Szeredi
2016-09-24 4:01 ` [PATCH 10/12] new iov_iter flavour: pipe-backed Al Viro
2016-09-29 20:53 ` Miklos Szeredi
2016-09-29 22:50 ` Al Viro
2016-09-29 22:50 ` Al Viro
2016-09-30 7:30 ` Miklos Szeredi
2016-10-03 3:34 ` [RFC] O_DIRECT vs EFAULT (was Re: [PATCH 10/12] new iov_iter flavour: pipe-backed) Al Viro
2016-10-03 17:07 ` Linus Torvalds
2016-10-03 18:54 ` Al Viro
2016-09-24 4:01 ` [PATCH 11/12] switch generic_file_splice_read() to use of ->read_iter() Al Viro
2016-09-24 4:02 ` [PATCH 12/12] switch default_file_splice_read() to use of pipe-backed iov_iter Al Viro
2016-09-23 19:03 ` [PATCH 05/11] skb_splice_bits(): get rid of callback Al Viro
2016-09-23 19:03 ` Al Viro
2016-09-23 19:04 ` [PATCH 06/11] new helper: add_to_pipe() Al Viro
2016-09-23 19:04 ` [PATCH 07/11] fuse_dev_splice_read(): switch to add_to_pipe() Al Viro
2016-09-23 19:06 ` [PATCH 08/11] cifs: don't use memcpy() to copy struct iov_iter Al Viro
2016-09-23 19:08 ` [PATCH 09/11] fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter() Al Viro
2016-09-26 9:31 ` Miklos Szeredi
2016-09-23 19:09 ` [PATCH 10/11] new iov_iter flavour: pipe-backed Al Viro
2016-09-23 19:10 ` [PATCH 11/11] switch generic_file_splice_read() to use of ->read_iter() Al Viro
2016-09-30 13:32 ` [RFC][CFT] splice_read reworked CAI Qian
2016-09-30 17:42 ` CAI Qian
2016-09-30 18:33 ` CAI Qian
2016-09-30 18:33 ` CAI Qian
2016-10-03 1:37 ` Al Viro
2016-10-03 17:49 ` CAI Qian
2016-10-04 17:39 ` local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked) CAI Qian
2016-10-04 21:42 ` tj
2016-10-05 14:09 ` CAI Qian
2016-10-05 15:30 ` tj
2016-10-05 15:54 ` CAI Qian
2016-10-05 18:57 ` CAI Qian
2016-10-05 20:05 ` Al Viro
2016-10-06 12:20 ` CAI Qian
2016-10-06 12:25 ` CAI Qian
2016-10-06 16:11 ` CAI Qian
2016-10-06 17:00 ` Linus Torvalds
2016-10-06 18:12 ` CAI Qian
2016-10-07 9:57 ` Dave Chinner
2016-10-07 15:25 ` Linus Torvalds
2016-10-07 7:08 ` Jan Kara
2016-10-07 14:43 ` CAI Qian
2016-10-07 15:27 ` CAI Qian
2016-10-07 18:56 ` CAI Qian
2016-10-09 21:54 ` Dave Chinner
2016-10-10 14:10 ` CAI Qian
2016-10-10 20:14 ` CAI Qian
2016-10-10 21:57 ` Dave Chinner
2016-10-12 19:50 ` [bisected] " CAI Qian
2016-10-12 20:59 ` Dave Chinner
2016-10-13 16:25 ` CAI Qian
2016-10-13 20:49 ` Dave Chinner
2016-10-13 20:56 ` CAI Qian
2016-10-09 21:51 ` Dave Chinner
2016-10-21 15:38 ` [4.9-rc1+] overlayfs lockdep CAI Qian
2016-10-24 12:57 ` Miklos Szeredi
2016-10-07 9:27 ` local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked) Dave Chinner
2016-10-27 12:52 ` local DoS - systemd hang or timeout with cgroup traces CAI Qian
2016-10-03 1:42 ` [RFC][CFT] splice_read reworked Al Viro
2016-10-03 14:06 ` CAI Qian
2016-10-03 15:20 ` CAI Qian
2016-10-03 21:12 ` Dave Chinner
2016-10-04 13:57 ` CAI Qian
2016-10-03 20:32 ` CAI Qian
2016-10-03 20:35 ` Al Viro
2016-10-04 13:29 ` CAI Qian
2016-10-04 14:28 ` Al Viro
2016-10-04 16:21 ` CAI Qian
2016-10-04 20:12 ` Al Viro
2016-10-05 14:30 ` CAI Qian
2016-10-05 16:07 ` Al Viro
2016-09-09 2:31 ` xfs_file_splice_read: possible circular locking dependency detected Al Viro
2016-09-09 2:39 ` Linus Torvalds
2016-09-09 2:26 ` Al Viro
2016-09-09 2:19 ` Al Viro
2016-09-08 18:01 ` Linus Torvalds
2016-09-08 20:39 ` CAI Qian
2016-09-08 21:19 ` Dave Chinner
2016-09-08 21:30 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160923190032.GA25771@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=axboe@kernel.dk \
--cc=caiqian@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.