All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>, Sage Weil <sage@inktank.com>,
	Mark Fasheh <mfasheh@suse.com>,
	xfs@oss.sgi.com, Steve French <sfrench@samba.org>,
	Joel Becker <jlbec@evilplan.org>,
	linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0/5] splice: locking changes and code refactoring
Date: Wed, 15 Jan 2014 18:10:27 +0000	[thread overview]
Message-ID: <20140115181027.GA8077@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140114172033.GU10323@ZenIV.linux.org.uk>

On Tue, Jan 14, 2014 at 05:20:33PM +0000, Al Viro wrote:
> And that's less than half of fs/*...  I'm not saying that the current
> situation on the write side is good; hell, just the mess with write/aio_write
> alone is ugly enough - we have
> 	* a bunch of file_operations without ->aio_write(); simple enough.
> 	* a bunch with ->write == do_sync_write.  Also simple.
> 	* several with NULL ->write and non-NULL ->aio_write(); same as
> do_sync_write() for ->write (socket, android/logger, kmsg, macvtap)
> 	* several with ->aio_write being an optimized equivalent of
> do_sync_write() (blackhole for /dev/null and /dev/zero, error for bad_inode)
> 	* 9p cached with its "oh, but if we have O_DIRECT we want ->write()
> to be different" (why not use a separate file_operations, then?  It's not
> as if ->open() couldn't switch to it if it sees O_DIRECT...)
> 	* two infinibad things (ipath and qib), with completely unrelated
> semantics on write(2) and writev(2) (the latter shared with aio).  As in
> "writev() of a single-element iovec array does things that do not even
> resemble what write() of the same data would've done".  Yes, really - check
> for yourself.
> 	* snd_pcm - hell knows; it might be that it tries to collect the
> data from iovec and push it in one go, as if it was a single write, but
> then it might be something as bogus as what ipath is doing...
> 	* gadgetfs - hell knows; ep_write() seems to be doing something
> beyond what ep_aio_write() does, but I haven't traced them down the call
> chain...  That one, BTW, won't be fun for splice - looks like it cares
> about datagram boundaries a lot, so it's not obvious what the semantics
> should be.
> 	* lustre.  I _think_ do_sync_write() would work there, but I'm might
> be easily missing something in all those layers of obfusca^Wgood software
> development practices.

BTW, ->read/->aio_read situation is only slighlty better - of file_operations
instances that have ->aio_read, most have do_sync_read() for ->read() (as
they ought to).  Exceptions:
	* 9p O_DIRECT (again)
	* NULL ->read where do_sync_read ought to be (socket, macvtap)
	* optimized ->read (/dev/zero, /dev/null, bad_inode)
	* snd_pcm - magic.  It (and its ->aio_write counterpart) wants exactly
one iovec per channel.  IOW, it's not a general-purpose ->aio_{read,write}
at all - it's a magic API shoehorned into readv(2)/writev(2) (and aio
IOCB_CMD_P{READ,WRITE}V as well).
	* lustre - probably could live with do_sync_read(), but there might
be stack footprint considerations or some really weird magic going on
(the difference is that instead of iocb on stack they appear to be using
per-thread one allocated on heap and hashed by pid, of all things).
It's really weird - they end up doing repeated hash lookups for that
per-thread wastebasket of a structure on different levels of call chain.
Looks like they have swept a lot of local variables of a lot of functions
into that thing; worse, it appears to be one of several dynamically allocated
bits of that thing, hidden behind a bunch of wrappers...  Overall feel is
Lovecraftian, complete with lurking horrors of the deep...  BTW, its ->aio_read
would better never return -EIOCBQUEUED - its ->read does *not* wait for
completion of iocb it has submitted.
	* gadgetfs - it appears to be seriously datagram-oriented; basically,
they want to reduce readv/writev to read/write, not the other way round.

> BTW, speaking of ->aio_write() - why the devil do we pass the pos
> argument (long long, at that) separately, when all call sites provably
> have it equal to iocb->ki_pos?  If nothing else, on a bunch of architectures
> it makes the difference between passing arguments in registers and spilling
> them on stack; moreover, if we do something and only then call
> generic_file_aio_write(), we get to propagate it all way down.  And
> generic_file_aio_write() has had explicit BUG_ON(iocb->ki_pos != pos)
> since 2.5.55, for crying out loud...

The same goes for ->aio_read() (except for s/2.5.55/2.5.39/)...

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>, Mark Fasheh <mfasheh@suse.com>,
	Joel Becker <jlbec@evilplan.org>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sage Weil <sage@inktank.com>, Steve French <sfrench@samba.org>
Subject: Re: [PATCH 0/5] splice: locking changes and code refactoring
Date: Wed, 15 Jan 2014 18:10:27 +0000	[thread overview]
Message-ID: <20140115181027.GA8077@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140114172033.GU10323@ZenIV.linux.org.uk>

On Tue, Jan 14, 2014 at 05:20:33PM +0000, Al Viro wrote:
> And that's less than half of fs/*...  I'm not saying that the current
> situation on the write side is good; hell, just the mess with write/aio_write
> alone is ugly enough - we have
> 	* a bunch of file_operations without ->aio_write(); simple enough.
> 	* a bunch with ->write == do_sync_write.  Also simple.
> 	* several with NULL ->write and non-NULL ->aio_write(); same as
> do_sync_write() for ->write (socket, android/logger, kmsg, macvtap)
> 	* several with ->aio_write being an optimized equivalent of
> do_sync_write() (blackhole for /dev/null and /dev/zero, error for bad_inode)
> 	* 9p cached with its "oh, but if we have O_DIRECT we want ->write()
> to be different" (why not use a separate file_operations, then?  It's not
> as if ->open() couldn't switch to it if it sees O_DIRECT...)
> 	* two infinibad things (ipath and qib), with completely unrelated
> semantics on write(2) and writev(2) (the latter shared with aio).  As in
> "writev() of a single-element iovec array does things that do not even
> resemble what write() of the same data would've done".  Yes, really - check
> for yourself.
> 	* snd_pcm - hell knows; it might be that it tries to collect the
> data from iovec and push it in one go, as if it was a single write, but
> then it might be something as bogus as what ipath is doing...
> 	* gadgetfs - hell knows; ep_write() seems to be doing something
> beyond what ep_aio_write() does, but I haven't traced them down the call
> chain...  That one, BTW, won't be fun for splice - looks like it cares
> about datagram boundaries a lot, so it's not obvious what the semantics
> should be.
> 	* lustre.  I _think_ do_sync_write() would work there, but I'm might
> be easily missing something in all those layers of obfusca^Wgood software
> development practices.

BTW, ->read/->aio_read situation is only slighlty better - of file_operations
instances that have ->aio_read, most have do_sync_read() for ->read() (as
they ought to).  Exceptions:
	* 9p O_DIRECT (again)
	* NULL ->read where do_sync_read ought to be (socket, macvtap)
	* optimized ->read (/dev/zero, /dev/null, bad_inode)
	* snd_pcm - magic.  It (and its ->aio_write counterpart) wants exactly
one iovec per channel.  IOW, it's not a general-purpose ->aio_{read,write}
at all - it's a magic API shoehorned into readv(2)/writev(2) (and aio
IOCB_CMD_P{READ,WRITE}V as well).
	* lustre - probably could live with do_sync_read(), but there might
be stack footprint considerations or some really weird magic going on
(the difference is that instead of iocb on stack they appear to be using
per-thread one allocated on heap and hashed by pid, of all things).
It's really weird - they end up doing repeated hash lookups for that
per-thread wastebasket of a structure on different levels of call chain.
Looks like they have swept a lot of local variables of a lot of functions
into that thing; worse, it appears to be one of several dynamically allocated
bits of that thing, hidden behind a bunch of wrappers...  Overall feel is
Lovecraftian, complete with lurking horrors of the deep...  BTW, its ->aio_read
would better never return -EIOCBQUEUED - its ->read does *not* wait for
completion of iocb it has submitted.
	* gadgetfs - it appears to be seriously datagram-oriented; basically,
they want to reduce readv/writev to read/write, not the other way round.

> BTW, speaking of ->aio_write() - why the devil do we pass the pos
> argument (long long, at that) separately, when all call sites provably
> have it equal to iocb->ki_pos?  If nothing else, on a bunch of architectures
> it makes the difference between passing arguments in registers and spilling
> them on stack; moreover, if we do something and only then call
> generic_file_aio_write(), we get to propagate it all way down.  And
> generic_file_aio_write() has had explicit BUG_ON(iocb->ki_pos != pos)
> since 2.5.55, for crying out loud...

The same goes for ->aio_read() (except for s/2.5.55/2.5.39/)...

  reply	other threads:[~2014-01-15 18:10 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12 18:14 [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2013-12-12 18:14 ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 1/5] splice: move balance_dirty_pages_ratelimited into pipe_to_file Christoph Hellwig
2013-12-12 18:15   ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 2/5] splice: nest i_mutex outside pipe_lock Christoph Hellwig
2013-12-12 18:15   ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 3/5] splice: use splice_from_pipe in generic_file_splice_write Christoph Hellwig
2013-12-12 18:15   ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 4/5] xfs: fix splice_write locking Christoph Hellwig
2013-12-12 18:15   ` Christoph Hellwig
2013-12-12 18:15 ` [PATCH 5/5] splice: stop exporting splice_from_pipe implementation details Christoph Hellwig
2013-12-12 18:15   ` Christoph Hellwig
2014-01-13 14:14 ` [PATCH 0/5] splice: locking changes and code refactoring Christoph Hellwig
2014-01-13 14:14   ` Christoph Hellwig
2014-01-13 23:56   ` Al Viro
2014-01-13 23:56     ` Al Viro
2014-01-14 13:22     ` Christoph Hellwig
2014-01-14 13:22       ` Christoph Hellwig
2014-01-14 17:20       ` Al Viro
2014-01-14 17:20         ` Al Viro
2014-01-15 18:10         ` Al Viro [this message]
2014-01-15 18:10           ` Al Viro
2014-01-18  6:40         ` Al Viro
2014-01-18  7:22           ` Linus Torvalds
2014-01-18  7:22             ` Linus Torvalds
2014-01-18  7:46             ` Al Viro
2014-01-18  7:56               ` Al Viro
2014-01-18  7:56                 ` Al Viro
2014-01-18  8:27               ` Al Viro
2014-01-18  8:44                 ` David Miller
2014-01-18  8:44                   ` David Miller
2014-02-07 17:10                   ` Al Viro
2014-02-07 17:10                     ` Al Viro
2014-01-18 19:59               ` Linus Torvalds
2014-01-18 20:10                 ` Al Viro
2014-01-18 20:27                   ` Al Viro
2014-01-18 20:27                     ` Al Viro
2014-01-18 20:30                     ` Al Viro
2014-01-18 20:30                       ` Al Viro
2014-01-19  5:13                   ` [RFC] unifying write variants for filesystems Al Viro
2014-01-19  5:13                     ` Al Viro
2014-01-20 13:55                     ` Christoph Hellwig
2014-01-20 13:55                       ` Christoph Hellwig
2014-01-20 20:32                       ` Linus Torvalds
2014-01-20 20:32                         ` Linus Torvalds
2014-02-01 22:43                         ` Al Viro
2014-02-01 22:43                           ` Al Viro
2014-02-02  0:13                           ` Linus Torvalds
2014-02-02  2:02                             ` Al Viro
2014-02-02  2:02                               ` Al Viro
2014-02-02 19:21                           ` Al Viro
2014-02-02 19:21                             ` Al Viro
2014-02-02 19:23                             ` Al Viro
2014-02-02 19:23                               ` Al Viro
2014-02-03 14:41                             ` Miklos Szeredi
2014-02-03 14:41                               ` Miklos Szeredi
2014-02-03 15:33                               ` Al Viro
2014-02-03 15:33                                 ` Al Viro
2014-02-02 23:16                           ` Anton Altaparmakov
2014-02-02 23:16                             ` Anton Altaparmakov
2014-02-03 15:12                           ` Christoph Hellwig
2014-02-03 16:24                             ` Al Viro
2014-02-03 16:50                             ` Dave Kleikamp
2014-02-03 16:23                           ` Dave Kleikamp
2014-02-04 12:44                             ` Al Viro
2014-02-04 12:44                               ` Al Viro
2014-02-04 12:52                               ` Kent Overstreet
2014-02-04 12:52                                 ` Kent Overstreet
2014-02-04 15:17                                 ` Al Viro
2014-02-04 15:17                                   ` Al Viro
2014-02-04 17:27                                   ` Zach Brown
2014-02-04 17:35                                     ` Kent Overstreet
2014-02-04 18:08                                       ` Al Viro
2014-02-04 18:08                                         ` Al Viro
2014-02-04 18:00                                     ` Al Viro
2014-02-04 18:00                                       ` Al Viro
2014-02-04 18:33                                       ` Zach Brown
2014-02-04 18:36                                         ` Al Viro
2014-02-04 18:36                                           ` Al Viro
2014-02-05 19:58                                           ` Al Viro
2014-02-05 20:42                                             ` Zach Brown
2014-02-06  9:08                                             ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140115181027.GA8077@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=jlbec@evilplan.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=sage@inktank.com \
    --cc=sfrench@samba.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.