From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>, Sage Weil <sage@inktank.com>,
Mark Fasheh <mfasheh@suse.com>,
xfs@oss.sgi.com, Steve French <sfrench@samba.org>,
Joel Becker <jlbec@evilplan.org>,
linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/5] splice: locking changes and code refactoring
Date: Sat, 18 Jan 2014 06:40:40 +0000 [thread overview]
Message-ID: <20140118064040.GE10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140114172033.GU10323@ZenIV.linux.org.uk>
On Tue, Jan 14, 2014 at 05:20:33PM +0000, Al Viro wrote:
> On Tue, Jan 14, 2014 at 05:22:07AM -0800, Christoph Hellwig wrote:
> > On Mon, Jan 13, 2014 at 11:56:46PM +0000, Al Viro wrote:
> > > On Mon, Jan 13, 2014 at 06:14:16AM -0800, Christoph Hellwig wrote:
> > > > ping? Would be nice to get this into 3.14
> > >
> > > Umm... The reason for pipe_lock outside of ->i_mutex is this:
> > > default_file_splice_write() calls splice_from_pipe() with
> > > write_pipe_buf for callback. splice_from_pipe() calls that
> > > callback under pipe_lock(pipe). And write_pipe_buf() calls
> > > __kernel_write(), which certainly might want to take ->i_mutex.
> > >
> > > Now, this codepath isn't taken for files that have non-NULL
> > > ->splice_write(), so that's not an issue for XFS and OCFS2,
> > > but having pipe_lock nest between the ->i_mutex for filesystems
> > > that do and do not have ->splice_write()... Ouch...
> >
> > What would be the alternative? Duplicating the code in even more
> > filesystems to enforce an non-natural locking order for filesystems
> > actually implementing splice? There don't actually seem to be a whole
> > lot of real filesystems not implemting splice_write, the prime use
> > would be for device drivers or synthetic ones. I'm not even sure
> > how much that fallback gets used in practice.
Hmm... In principle, the following would be no worse than what
generic_file_splice_write() is doing: confirm and map the pages, build
an iovec and use ->aio_write() to write it out, then unmap the suckers,
release ones entirely written to file and adjust the partially
written one. All under pipe_lock(). Hell, if we introduce
kernel_writev() (either by calling vfs_writev() or taking do_readv_writev()
sans copying iovec and using that under set_fs()), we could switch
default_file_splice_write() to that and get rid of ->splice_write() for
the majority of filesystems, if not all of them.
Sure, it means copying from pipe buffers to pagecache, but we have
generic_file_splice_write() do that copy anyway - conditional memcpy()
in pipe_to_file() is actually unconditional; that if (page != buf->page) in
there had just been forgotten by Nick back in 2007 ("1/2 splice: dont steal").
Objections, comments?
The problem Christoph was talking about is that generic_file_splice_write()
plays with ->i_mutex and both gets/drops it for each page of IO *and*
causes PITA for any fs that wants some locks of its own taken in addition
to ->i_mutex on the write paths. What ->splice_write() without page
stealing is doing is pretty much a writev() from array of pages in kernel
space; so it looks like we might as well just reuse writev() guts for that...
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-01-18 6:40 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 18:14 [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2013-12-12 18:14 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 1/5] splice: move balance_dirty_pages_ratelimited into pipe_to_file Christoph Hellwig
2013-12-12 18:15 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 2/5] splice: nest i_mutex outside pipe_lock Christoph Hellwig
2013-12-12 18:15 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 3/5] splice: use splice_from_pipe in generic_file_splice_write Christoph Hellwig
2013-12-12 18:15 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 4/5] xfs: fix splice_write locking Christoph Hellwig
2013-12-12 18:15 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 5/5] splice: stop exporting splice_from_pipe implementation details Christoph Hellwig
2013-12-12 18:15 ` Christoph Hellwig
2014-01-13 14:14 ` [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2014-01-13 14:14 ` Christoph Hellwig
2014-01-13 23:56 ` Al Viro
2014-01-13 23:56 ` Al Viro
2014-01-14 13:22 ` Christoph Hellwig
2014-01-14 13:22 ` Christoph Hellwig
2014-01-14 17:20 ` Al Viro
2014-01-14 17:20 ` Al Viro
2014-01-15 18:10 ` Al Viro
2014-01-15 18:10 ` Al Viro
2014-01-18 6:40 ` Al Viro [this message]
2014-01-18 7:22 ` Linus Torvalds
2014-01-18 7:22 ` Linus Torvalds
2014-01-18 7:46 ` Al Viro
2014-01-18 7:56 ` Al Viro
2014-01-18 7:56 ` Al Viro
2014-01-18 8:27 ` Al Viro
2014-01-18 8:44 ` David Miller
2014-01-18 8:44 ` David Miller
2014-02-07 17:10 ` Al Viro
2014-02-07 17:10 ` Al Viro
2014-01-18 19:59 ` Linus Torvalds
2014-01-18 20:10 ` Al Viro
2014-01-18 20:27 ` Al Viro
2014-01-18 20:27 ` Al Viro
2014-01-18 20:30 ` Al Viro
2014-01-18 20:30 ` Al Viro
2014-01-19 5:13 ` [RFC] unifying write variants for filesystems Al Viro
2014-01-19 5:13 ` Al Viro
2014-01-20 13:55 ` Christoph Hellwig
2014-01-20 13:55 ` Christoph Hellwig
2014-01-20 20:32 ` Linus Torvalds
2014-01-20 20:32 ` Linus Torvalds
2014-02-01 22:43 ` Al Viro
2014-02-01 22:43 ` Al Viro
2014-02-02 0:13 ` Linus Torvalds
2014-02-02 2:02 ` Al Viro
2014-02-02 2:02 ` Al Viro
2014-02-02 19:21 ` Al Viro
2014-02-02 19:21 ` Al Viro
2014-02-02 19:23 ` Al Viro
2014-02-02 19:23 ` Al Viro
2014-02-03 14:41 ` Miklos Szeredi
2014-02-03 14:41 ` Miklos Szeredi
2014-02-03 15:33 ` Al Viro
2014-02-03 15:33 ` Al Viro
2014-02-02 23:16 ` Anton Altaparmakov
2014-02-02 23:16 ` Anton Altaparmakov
2014-02-03 15:12 ` Christoph Hellwig
2014-02-03 16:24 ` Al Viro
2014-02-03 16:50 ` Dave Kleikamp
2014-02-03 16:23 ` Dave Kleikamp
2014-02-04 12:44 ` Al Viro
2014-02-04 12:44 ` Al Viro
2014-02-04 12:52 ` Kent Overstreet
2014-02-04 12:52 ` Kent Overstreet
2014-02-04 15:17 ` Al Viro
2014-02-04 15:17 ` Al Viro
2014-02-04 17:27 ` Zach Brown
2014-02-04 17:35 ` Kent Overstreet
2014-02-04 18:08 ` Al Viro
2014-02-04 18:08 ` Al Viro
2014-02-04 18:00 ` Al Viro
2014-02-04 18:00 ` Al Viro
2014-02-04 18:33 ` Zach Brown
2014-02-04 18:36 ` Al Viro
2014-02-04 18:36 ` Al Viro
2014-02-05 19:58 ` Al Viro
2014-02-05 20:42 ` Zach Brown
2014-02-06 9:08 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140118064040.GE10323@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=jlbec@evilplan.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=sage@inktank.com \
--cc=sfrench@samba.org \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.