From: Al Viro <viro@zeniv.linux.org.uk>
To: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-aio@kvack.org, linux-block@vger.kernel.org,
Linux API <linux-api@vger.kernel.org>,
hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Mon, 4 Feb 2019 02:56:12 +0000 [thread overview]
Message-ID: <20190204025612.GR2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAG48ez2kLWUEambWesQQaeVn_u=PtjuMeDUc49yamhYspfd7Sg@mail.gmail.com>
On Wed, Jan 30, 2019 at 02:29:05AM +0100, Jann Horn wrote:
> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@kernel.dk> wrote:
> > We normally have to fget/fput for each IO we do on a file. Even with
> > the batching we do, the cost of the atomic inc/dec of the file usage
> > count adds up.
> >
> > This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
> > for the io_uring_register(2) system call. The arguments passed in must
> > be an array of __s32 holding file descriptors, and nr_args should hold
> > the number of file descriptors the application wishes to pin for the
> > duration of the io_uring context (or until IORING_UNREGISTER_FILES is
> > called).
> >
> > When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
> > member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
> > to the index in the array passed in to IORING_REGISTER_FILES.
> >
> > Files are automatically unregistered when the io_uring context is
> > torn down. An application need only unregister if it wishes to
> > register a new set of fds.
>
> Crazy idea:
>
> Taking a step back, at a high level, basically this patch creates sort
> of the same difference that you get when you compare the following
> scenarios for normal multithreaded I/O in userspace:
> This kinda makes me wonder whether this is really something that
> should be implemented specifically for the io_uring API, or whether it
> would make sense to somehow handle part of this in the generic VFS
> code and give the user the ability to prepare a new files_struct that
> can then be transferred to the worker thread, or something like
> that... I'm not sure whether there's a particularly clean way to do
> that though.
Using files_struct for that opens a can of worms you really don't
want to touch.
Consider the following scenario with any variant of this interface:
* create io_uring fd.
* send an SCM_RIGHTS with that fd to AF_UNIX socket.
* add the descriptor of that AF_UNIX socket to your fd
* close AF_UNIX fd, close io_uring fd.
Voila - you've got a shiny leak. No ->release() is called for
anyone (and you really don't want to do that on ->flush(), because
otherwise a library helper doing e.g. system("/bin/date") will tear
down all the io_uring in your process). The socket is held by
the reference you've stashed into io_uring (whichever way you do
that). io_uring is held by the reference you've stashed into
SCM_RIGHTS datagram in queue of the socket.
No matter what, you need net/unix/garbage.c to be aware of that stuff.
And getting files_struct lifetime mixed into that would be beyond
any reason.
The only reason for doing that as a descriptor table would be
avoiding the cost of fget() in whatever uses it, right? Since
those are *not* the normal syscalls (and fdget() really should not
be used anywhere other than the very top of syscall's call chain -
that's another reason why tossing file_struct around like that
is insane) and since the benefit is all due to the fact that it's
*NOT* shared, *NOT* modified in parallel, etc., allowing us to
treat file references as stable... why the hell use the descriptor
tables at all?
All you need is an array of struct file *, explicitly populated.
With net/unix/garbage.c aware of such beasts. Guess what? We
do have such an object already. The one net/unix/garbage.c is
working with. SCM_RIGHTS datagrams, that is.
IOW, can't we give those io_uring descriptors associated struct
unix_sock? No socket descriptors, no struct socket (probably),
just the AF_UNIX-specific part thereof. Then teach
unix_inflight()/unix_notinflight() about getting unix_sock out
of these guys (incidentally, both would seem to benefit from
_not_ touching unix_gc_lock in case when there's no unix_sock
attached to file we are dealing with - I might be missing
something very subtle about barriers there, but it doesn't
look likely).
And make that (i.e. registering the descriptors) mandatory.
Hell, combine that with creating io_uring fd, if we really
care about the syscall count. Benefits:
* no file_struct refcount wanking
* no fget()/fput() (conditional, at that) from kernel
threads
* no CLOEXEC-dependent anything; just the teardown
on the final fput(), whichever way it comes.
* no fun with duelling garbage collectors.
next prev parent reply other threads:[~2019-02-04 2:56 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 20:47 ` Jann Horn
2019-01-29 20:56 ` Jens Axboe
2019-01-29 21:10 ` Jann Horn
2019-01-29 21:33 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 23:31 ` Jann Horn
2019-01-29 23:44 ` Jens Axboe
2019-01-30 15:33 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 22:44 ` Jann Horn
2019-01-29 22:56 ` Jens Axboe
2019-01-29 23:03 ` Jann Horn
2019-01-29 23:06 ` Jens Axboe
2019-01-29 23:08 ` Jann Horn
2019-01-29 23:14 ` Jens Axboe
2019-01-29 23:42 ` Jann Horn
2019-01-29 23:51 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 1:29 ` Jann Horn
2019-01-30 15:35 ` Jens Axboe
2019-02-04 2:56 ` Al Viro [this message]
2019-02-05 2:19 ` Jens Axboe
2019-02-05 17:57 ` Jens Axboe
2019-02-05 19:08 ` Jens Axboe
2019-02-06 0:27 ` Jens Axboe
2019-02-06 1:01 ` Al Viro
2019-02-06 17:56 ` Jens Axboe
2019-02-07 4:05 ` Al Viro
2019-02-07 16:14 ` Jens Axboe
2019-02-07 16:30 ` Al Viro
2019-02-07 16:35 ` Jens Axboe
2019-02-07 16:51 ` Al Viro
2019-02-06 0:56 ` Al Viro
2019-02-06 13:41 ` Jens Axboe
2019-02-07 4:00 ` Al Viro
2019-02-07 9:22 ` Miklos Szeredi
2019-02-07 13:31 ` Al Viro
2019-02-07 14:20 ` Miklos Szeredi
2019-02-07 15:20 ` Al Viro
2019-02-07 15:27 ` Miklos Szeredi
2019-02-07 16:26 ` Al Viro
2019-02-07 19:08 ` Miklos Szeredi
2019-02-07 18:45 ` Jens Axboe
2019-02-07 18:58 ` Jens Axboe
2019-02-11 15:55 ` Jonathan Corbet
2019-02-11 17:35 ` Al Viro
2019-02-11 20:33 ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-08 12:17 ` Alan Jenkins
2019-02-08 12:57 ` Jens Axboe
2019-02-08 14:02 ` Alan Jenkins
2019-02-08 15:13 ` Jens Axboe
2019-02-12 12:29 ` Alan Jenkins
2019-02-12 15:17 ` Jens Axboe
2019-02-12 17:21 ` Alan Jenkins
2019-02-12 17:33 ` Jens Axboe
2019-02-12 20:23 ` Alan Jenkins
2019-02-12 21:10 ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 16:36 ` Jann Horn
2019-01-29 18:13 ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204025612.GR2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=avi@scylladb.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jannh@google.com \
--cc=jmoyer@redhat.com \
--cc=linux-aio@kvack.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).