linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Thu, 7 Feb 2019 11:58:21 -0700	[thread overview]
Message-ID: <39260843-9918-be3b-7f0f-3d034ec3e7a8@kernel.dk> (raw)
In-Reply-To: <73e23146-2138-5a46-46ed-9c7f1f912a04@kernel.dk>

On 2/7/19 11:45 AM, Jens Axboe wrote:
> On 2/6/19 9:00 PM, Al Viro wrote:
>> On Wed, Feb 06, 2019 at 06:41:00AM -0700, Jens Axboe wrote:
>>> On 2/5/19 5:56 PM, Al Viro wrote:
>>>> On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
>>>>> Proof is in the pudding, here's the main commit introducing io_uring
>>>>> and now wiring it up to the AF_UNIX garbage collection:
>>>>>
>>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=io_uring&id=158e6f42b67d0abe9ee84886b96ca8c4b3d3dfd5
>>>>>
>>>>> How does that look?
>>>>
>>>> In a word - wrong.  Some theory: garbage collector assumes that there is
>>>> a subset of file references such that
>>>> 	* for all files with such references there's an associated unix_sock.
>>>> 	* all such references are stored in SCM_RIGHTS datagrams that can be
>>>> found by the garbage collector (currently: for data-bearing AF_UNIX sockets -
>>>> queued SCM_RIGHTS datagrams, for listeners - SCM_RIGHTS datagrams sent via
>>>> yet-to-be-accepted connections).
>>>> 	* there is an efficient way to count those references for given file
>>>> (->inflight of the corresponding unix_sock).
>>>> 	* removal of those references would render the graph acyclic.
>>>> 	* file can _NOT_ be subject to syscalls unless there are references
>>>> to it outside of that subset.
>>>
>>> IOW, we cannot use fget() for registering files, and we still need fget/fput
>>> in the fast path to retain safe use of the file. If I'm understanding you
>>> correctly?
>>
>> No.  *ALL* references (inflight and not) are the same for file->f_count.
>> unix_inflight() does not grab a new reference to file; it only says that
>> reference passed to it by the caller is now an in-flight one.
>>
>> OK, braindump time:
> 
> [snip]
> 
> This is great info, and I think it belongs in Documentation/ somewhere.
> Not sure I've ever seen such a good and detailed dump of this before.
> 
>> What you are about to add is *ANOTHER* kind of loops - references
>> to files in the "registered" set are pinned down by owning io_uring.
>>
>> That would invalidate just about every assumption made the garbage
>> collector - even if you forbid to register io_uring itself, you
>> still can register both ends of AF_UNIX socket pair, then pass
>> io_uring in SCM_RIGHTS over that, then close all descriptors involved.
>> From the garbage collector point of view all sockets have external
>> references, so there's nothing to collect.  In fact those external
>> references are only reachable if you have a reachable reference
>> to io_uring, so we get a leak.
>>
>> To make it work:
>> 	* have unix_sock created for each io_uring (as your code does)
>> 	* do *NOT* have unix_inflight() done at that point - it's
>> completely wrong there.
>> 	* file set registration becomes
>> 		* create and populate SCM_RIGHTS, with the same
>> fget()+grab an extra reference + unix_inflight() sequence.
>> Don't forget to have skb->destructor set to unix_destruct_scm
>> or equivalent thereof.
>> 		* remember UNIXCB(skb).fp - that'll give you your
>> array of struct file *, to use in lookups.
>> 		* queue it into your unix_sock
>> 		* do _one_ fput() for everything you've grabbed,
>> dropping one of two references you've taken.
>> 	* unregistering is simply skb_dequeue() + kfree_skb().
>> 	* in ->release() you do sock_release(); it'll do
>> everything you need (including unregistering the set, etc.)
> 
> This is genius! I implemented this and it works. I've verified that the
> previous test app failed to release due to the loop, and with this in
> place, once the GC kicks in, the io_uring is released appropriately.
> 
>> The hairiest part is the file set registration, of course -
>> there's almost certainly a helper or two buried in that thing;
>> simply exposing all the required net/unix/af_unix.c bits is
>> ucking fugly.
> 
> Outside of the modification to unix_get_socket(), the only change I had
> to make was to ensure that unix_destruct_scm() is available to io_uring.
> No other changes needed.
> 
>> I'm not sure what you propose for non-registered descriptors -
>> _any_ struct file reference that outlives the return from syscall
>> stored in some io_uring-attached data structure is has exact same
>> loop (and leak) problem.  And if you mean to have it dropped before
>> return from syscall, I'm afraid I don't follow you.  How would
>> that be done?
>>
>> Again, "io_uring descriptor can't be used in those requests" does
>> not help at all - use a socket instead, pass the io_uring fd over
>> it in SCM_RIGHTS and you are back to square 1.
> 
> I wasn't proposing to fput() before return, otherwise I can't hang on to
> that file *.
> 
> Right now for async punt, we don't release the reference, and then we
> fput() when IO completes. According to what you're saying here, that's
> not good enough. Correct me if I'm wrong, but what if we:
> 
> 1) For non-sock/io_uring fds, the current approach is sufficient
> 2) Disallow io_uring fd, doesn't make sense anyway
> 
> That leaves the socket fd, which is problematic. Should be solvable by
> allocating an skb and marking that file inflight?

Actually, we can just NOT set NOWAIT for types we don't support. That
means we'll never punt to async context for those.

-- 
Jens Axboe


  reply	other threads:[~2019-02-07 18:58 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 20:47   ` Jann Horn
2019-01-29 20:56     ` Jens Axboe
2019-01-29 21:10       ` Jann Horn
2019-01-29 21:33         ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 23:31   ` Jann Horn
2019-01-29 23:44     ` Jens Axboe
2019-01-30 15:33       ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 22:44   ` Jann Horn
2019-01-29 22:56     ` Jens Axboe
2019-01-29 23:03       ` Jann Horn
2019-01-29 23:06         ` Jens Axboe
2019-01-29 23:08           ` Jann Horn
2019-01-29 23:14             ` Jens Axboe
2019-01-29 23:42               ` Jann Horn
2019-01-29 23:51                 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe [this message]
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-08 12:17   ` Alan Jenkins
2019-02-08 12:57     ` Jens Axboe
2019-02-08 14:02       ` Alan Jenkins
2019-02-08 15:13         ` Jens Axboe
2019-02-12 12:29           ` Alan Jenkins
2019-02-12 15:17             ` Jens Axboe
2019-02-12 17:21               ` Alan Jenkins
2019-02-12 17:33                 ` Jens Axboe
2019-02-12 20:23                   ` Alan Jenkins
2019-02-12 21:10                     ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39260843-9918-be3b-7f0f-3d034ec3e7a8@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=avi@scylladb.com \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).