linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@lst.de>
Cc: Miklos Szeredi <mszeredi@suse.cz>,
	linux-aio@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] fs: remove ki_nbytes
Date: Mon, 2 Feb 2015 08:14:31 +0000	[thread overview]
Message-ID: <20150202081431.GX29656@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150202081112.GW29656@ZenIV.linux.org.uk>

On Mon, Feb 02, 2015 at 08:11:12AM +0000, Al Viro wrote:
> On Mon, Feb 02, 2015 at 09:07:29AM +0100, Christoph Hellwig wrote:
> > On Sat, Jan 31, 2015 at 06:08:41AM +0000, Al Viro wrote:
> > > and be done with that.  I'll put drivers/usb/gadget patches into a stable
> > > branch and ask use folks to pull from it - that's the simplest of this
> > > series, actually...
> > 
> > use folks?  Btw, does this mean you patches to switch it over, or do
> 
> usb.
> 
> > you want me to finish my conversion.
> 
> I have f_fs.c conversion and partial legacy/inode.c one.  Will post tomorrow,
> about to fall asleep right now...

FWIW, here's the current situation with read/write methods:

1) a bunch of file_operations instances have only ->read() and/or ->write();
no ->aio_... or ->..._iter versions, no ->splice_write().  Those are basically
device drivers:
	readv is equivalent to loop over segments, with read on each.
	writev is equivalent to loop over segments, with write on each.
	AIO read and write are all synchronous
	splice_write is does kmap on each buffer in turn and write that
Note that splitting a buffer into several adjacent pieces and submitting
a vectored IO is *not* guaranteed to be equivalent to plain IO on the entire
buffer - it has datagram-like semantics and boundaries do matter.

2) regular ->read_iter/->write_iter users:
	->read is new_sync_read() (or NULL, if ->read_iter is NULL)
	->write is new_sync_write() (or NULL, if ->write_iter is NULL)
	->aio_read and ->aio_write are NULL
	->splice_write is iter_splice_write (or NULL, if ->write_iter is NULL)
Those are stream-style ones; what matter is concatenation of the buffers, not
the boundaries.  Right now the following is true: ->read_iter/->write_iter on
a sync kiocb never returns EIOCBQUEUED and iterator argument is advanced
exactly by the amount of data transferred.

3) ones like (2), but with NULL ->splice_write() in spite of present
->write_iter().  AFAICS they can be bulk-converted to (2).  This is what the
bulk of irregularities boil down to.

4) ones like (2) or (3), except ->read() and ->write() are left NULL instead
of doing new_sync_{read,write}().  Mostly equivalent to (2); some codepaths
call ->read() or ->write() directly, instead of going through vfs_... wrappers,
and for those it can be a bit of a headache.  In any case, there are very
few such instances (only 3) and they are trivial to convert to (2).

5) Two instances in fs/9p have ->write_iter() *and* ->write(), the latter not
being new_sync_write().  Ditto for ->read_iter() and ->read().  No other
instance has such combinations.  FWIW, they are *almost* regular - their
->read() and ->write() are new_sync_... unless it's an O_DIRECT file.
It might make sense to try and convert those to ->direct_IO(); as it is, their
O_DIRECT is ignored for readv()/writev(), which is arguably a bug.

6) drivers/char/mem.c stuff; they are equivalent to (2), but somewhat
optimised.  Some of that might make sense to convert to (2); there _are_ hot
paths involved, so we'd better be careful there.

7) bad_file_ops.  It is equivalent to (2); it's also a very special
case.  FWIW, I'm not at all sure that we *need* most of the methods in
there.  We never replace ->f_op of an opened file with that, so if we
end up with that sucker, it happens on fresh open of something that had
its ->i_fop replaced with bad_file_ops.  And that will instantly fail in
bad_file_open().  Why do we need the rest of the methods in there?
AFAICS, we should drop all but ->open().

8) hypfs - AFAICS, converts to (2) easily; I'll do that.

9) FUSE - ->aio_read/->aio_write user, with unusual ->splice_write as
well.  I _think_ it can be converted to (2), but that'll take a bit of
work.

10) sockets; conversion to ->read_iter/->write_iter had been posted to
netdev, ->splice_write() (the only user of ->sendpage(), BTW) is a work
for the next cycle.

11) NTFS with rw support enabled.  ->aio_write() user, needs to be converted
to ->write_iter().  AFAICS, it wasn't particularly high priority for Anton
(along with all rw stuff in fs/ntfs); it doesn't look terribly hard, but then
it wasn't a high priority for me either.

12) virtio_console.  Interesting ->splice_write() there; hadn't looked deep
enough.

13) two odd drivers/usb/gadget instances.  I have conversion for f_fs.c,
but legacy/inode.c (ep_read() et.al.) is trickier.  The problem in there
is that writev() on a single-element vector is *not* equivalent to plain
write().  The former treats the wrong-direction endpoint as EINVAL; the
latter does
                if (usb_endpoint_xfer_isoc(&data->desc)) {
                        mutex_unlock(&data->lock);
                        return -EINVAL;
                }
                DBG (data->dev, "%s halt\n", data->name);
                spin_lock_irq (&data->dev->lock);
                if (likely (data->ep != NULL))
                        usb_ep_set_halt (data->ep);
                spin_unlock_irq (&data->dev->lock);
                mutex_unlock(&data->lock);
                return -EBADMSG;
instead.  IOW, for isochronous endpoints behaviour is the same, but the
rest behaves differently.  If not for that, that sucker would convert
to (3) easily; ->splice_write() semantics is potentially interesting -
the question is where do we want transfer boundaries.  As it is, writev()
collects the entire iovec and shoves it into single transfer; splice() to
that thing ends up with each pipe_buf going in a separate transfer, mergable
or not.  I would really appreciate comments on semantics of that thing from
USB folks...

14) ipathfs and qibfs: seriously different semantics for write and writev/AIO
write.  As in "different set of commands recognized"; AIO write plays like
writev, whether it's vectored or not (and it's always synchronous).
I've no idea who had come up with that... highly innovative API or why
hadn't they simply added two files (it's all on their virtual filesystem,
so they had full control of layout) rather that multiplexing two different
command sets in such a fashion.

15) /dev/snd/pcmC*D*[cp].  Again, different semantics for write and writev,
with the latter wanting nr_seqs equal to the number of channels.  AIO
non-vectored write fails unless there's only one channel.  Not sure how
ALSA userland uses that thing; AIO side is always synchronous, so it might
be simply never used.  FWIW, I'm not sure that write() on a single-channel
one is equivalent to 1-element writev() - silencing-related logics seem to
differ.

That's it.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2015-02-02  8:14 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-27 17:55 [RFC] split struct kiocb Christoph Hellwig
2015-01-27 17:55 ` [PATCH 1/5] fs: don't allow to complete sync iocbs through aio_complete Christoph Hellwig
2015-01-28 15:30   ` Miklos Szeredi
2015-01-28 16:57     ` Christoph Hellwig
2015-01-31  3:01       ` Maxim Patlasov
2015-01-27 17:55 ` [PATCH 2/5] fs: saner aio_complete prototype Christoph Hellwig
2015-01-31 10:04   ` Al Viro
2015-01-27 17:55 ` [PATCH 3/5] fs: remove ki_nbytes Christoph Hellwig
2015-01-31  6:08   ` Al Viro
2015-02-02  8:07     ` Christoph Hellwig
2015-02-02  8:11       ` Al Viro
2015-02-02  8:14         ` Al Viro [this message]
2015-02-02 14:26           ` Christoph Hellwig
2015-02-04  8:34             ` Al Viro
2015-02-04 18:17               ` Alan Stern
2015-02-04 19:06                 ` Al Viro
2015-02-04 20:30                   ` Alan Stern
2015-02-04 23:07                     ` Al Viro
2015-02-05  8:24                       ` Robert Baldyga
2015-02-05  8:47                         ` Al Viro
2015-02-05  9:03                           ` Al Viro
2015-02-05  9:15                             ` Robert Baldyga
     [not found]                       ` <20150204230733.GK29656-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2015-02-05 15:29                         ` Alan Stern
2015-02-06  7:03                           ` Al Viro
     [not found]                             ` <20150206070350.GX29656-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2015-02-06  8:44                               ` Robert Baldyga
2015-02-07  5:44                           ` Al Viro
2015-02-07  5:48                             ` [PATCH 1/6] new helper: dup_iter() Al Viro
2015-02-07  5:48                             ` [PATCH 2/6] gadget/function/f_fs.c: close leaks Al Viro
2015-02-07  5:48                             ` [PATCH 3/6] gadget/function/f_fs.c: use put iov_iter into io_data Al Viro
2015-02-07  5:48                             ` [PATCH 4/6] gadget/function/f_fs.c: switch to ->{read,write}_iter() Al Viro
2015-02-07  5:48                             ` [PATCH 5/6] gadgetfs: use-after-free in ->aio_read() Al Viro
2015-02-07  5:48                             ` [PATCH 6/6] gadget: switch ep_io_operations to ->read_iter/->write_iter Al Viro
2015-02-02 14:20         ` [PATCH 3/5] fs: remove ki_nbytes Christoph Hellwig
2015-01-27 17:55 ` [PATCH 4/5] fs: split generic and aio kiocb Christoph Hellwig
2015-01-27 17:55 ` [PATCH 5/5] fs: add async read/write interfaces Christoph Hellwig
2015-01-31  6:29   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150202081431.GX29656@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hch@lst.de \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mszeredi@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).