linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>, Mark Fasheh <mfasheh@suse.com>,
	Joel Becker <jlbec@evilplan.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	xfs@oss.sgi.com, Sage Weil <sage@inktank.com>,
	Steve French <sfrench@samba.org>
Subject: [RFC] unifying write variants for filesystems
Date: Sun, 19 Jan 2014 05:13:35 +0000	[thread overview]
Message-ID: <20140119051335.GN10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140118201031.GI10323@ZenIV.linux.org.uk>

On Sat, Jan 18, 2014 at 08:10:31PM +0000, Al Viro wrote:

> Ouch...  No, I hadn't meant that kind of insanity, but I'd missed the
> problem with scarcity of mappings completely...

OK, that pretty much kills this approach.  Pity...

Folks, what do you think about the following:
	* a new data structure:
struct io_source {
	enum {IO_IOVEC, IO_PVEC} type;
	union {
		struct iovec *iov;
		struct pvec {
			struct page *page;
			unsigned offset;
			unsigned size;
		} *pvec;
	};
}
	* a new method that would look like aio_write, but take
struct io_source instead of iovec.
	* store the type in iov_iter (normally - IO_UIOVEC) and teach the
code dealing with it to do the right thing depending on type.  I.e. instead
of __copy_from_user_inatomic() do kmap_atomic()/memcpy()/kunmap_atomic() if
it's a IO_PAGEVEC.
	* generic_file_aio_write() analog for new method, converging with
generic_file_aio_write() almost immediately (basically, as soon as iov_iter
has been initialized).
	* new_aio_write() consisting of
{
	struct io_source source = {.type = IO_UIOVEC, .user = iov};
	return file->f_op-><new_method>(iocb, &source, nr_segs, pos);
}
	* new_sync_write(), doing what do_sync_write() does for files
that have new_aio_write() as ->aio_write().
	* new_splice_write() usable for files that provide that method -
it would collect pipe_buffers, put together struct pvec array and pass
it to that method.  All mapping the pages would happen one-by-one
and only around actual copying the data.  And, of course, the locking
would be identical to what we do for write()/writev()/aio write

	Then filesystems can switch to that new method, turning their
flipping their aio_write() instances to new type and replacing ->aio_write
with default_aio_write, ->write with new_write and ->splice_write with
new_splice_write.

	Actually, there's a possibility that it would be possible to use
it for *all* instances of ->splice_write() - we'd need to store something
a pointer to "call this to try and steal this page" function in pvec
and allow the method do actual stealing.  Note that pipe_buffer ->steal()
only uses the page argument - they all ignore which pipe it's in (and
there's nothing they could usefully do if they knew which pipe had it been
in the first place).

	This is very preliminary, of course, and I might easily miss
something - the previous idea was unworkable, after all.  Comments
would be very welcome...

  parent reply	other threads:[~2014-01-19  5:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12 18:14 [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2013-12-12 18:15 ` [PATCH 1/5] splice: move balance_dirty_pages_ratelimited into pipe_to_file Christoph Hellwig
2013-12-12 18:15 ` [PATCH 2/5] splice: nest i_mutex outside pipe_lock Christoph Hellwig
2013-12-12 18:15 ` [PATCH 3/5] splice: use splice_from_pipe in generic_file_splice_write Christoph Hellwig
2013-12-12 18:15 ` [PATCH 4/5] xfs: fix splice_write locking Christoph Hellwig
2013-12-12 18:15 ` [PATCH 5/5] splice: stop exporting splice_from_pipe implementation details Christoph Hellwig
2014-01-13 14:14 ` [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2014-01-13 23:56   ` Al Viro
2014-01-14 13:22     ` Christoph Hellwig
2014-01-14 17:20       ` Al Viro
2014-01-15 18:10         ` Al Viro
2014-01-18  6:40         ` Al Viro
2014-01-18  7:22           ` Linus Torvalds
2014-01-18  7:46             ` Al Viro
2014-01-18  7:56               ` Al Viro
2014-01-18  8:27               ` Al Viro
2014-01-18  8:44                 ` David Miller
2014-02-07 17:10                   ` Al Viro
2014-01-18 19:59               ` Linus Torvalds
2014-01-18 20:10                 ` Al Viro
2014-01-18 20:27                   ` Al Viro
2014-01-18 20:30                     ` Al Viro
2014-01-19  5:13                   ` Al Viro [this message]
2014-01-20 13:55                     ` [RFC] unifying write variants for filesystems Christoph Hellwig
2014-01-20 20:32                       ` Linus Torvalds
2014-02-01 22:43                         ` Al Viro
2014-02-02  0:13                           ` Linus Torvalds
2014-02-02  2:02                             ` Al Viro
2014-02-02 19:21                           ` Al Viro
2014-02-02 19:23                             ` Al Viro
2014-02-03 14:41                             ` Miklos Szeredi
2014-02-03 15:33                               ` Al Viro
2014-02-02 23:16                           ` Anton Altaparmakov
2014-02-03 15:12                           ` Christoph Hellwig
2014-02-03 16:24                             ` Al Viro
2014-02-03 16:50                             ` Dave Kleikamp
2014-02-03 16:23                           ` Dave Kleikamp
2014-02-04 12:44                             ` Al Viro
2014-02-04 12:52                               ` Kent Overstreet
2014-02-04 15:17                                 ` Al Viro
2014-02-04 17:27                                   ` Zach Brown
2014-02-04 17:35                                     ` Kent Overstreet
2014-02-04 18:08                                       ` Al Viro
2014-02-04 18:00                                     ` Al Viro
2014-02-04 18:33                                       ` Zach Brown
2014-02-04 18:36                                         ` Al Viro
2014-02-05 19:58                                           ` Al Viro
2014-02-05 20:42                                             ` Zach Brown
2014-02-06  9:08                                             ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140119051335.GN10323@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=jlbec@evilplan.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=sage@inktank.com \
    --cc=sfrench@samba.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).