Re: [PATCH 1/3] io_uring: add support for async work inheriting files table

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-block@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table
Date: Thu, 24 Oct 2019 18:35:53 -0600	[thread overview]
Message-ID: <947c74b9-e828-e190-19fc-449c72a20798@kernel.dk> (raw)
In-Reply-To: <CAG48ez00zr2P1WCznnXmTvq+FQ4Ji8kDnuNqbeeMvOh_MhXeTg@mail.gmail.com>

On 10/24/19 5:13 PM, Jann Horn wrote:
> On Fri, Oct 25, 2019 at 12:04 AM Jens Axboe <axboe@kernel.dk> wrote:
>> On 10/24/19 2:31 PM, Jann Horn wrote:
>>> On Thu, Oct 24, 2019 at 9:41 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 10/18/19 12:50 PM, Jann Horn wrote:
>>>>> On Fri, Oct 18, 2019 at 8:16 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> On 10/18/19 12:06 PM, Jann Horn wrote:
>>>>>>> But actually, by the way: Is this whole files_struct thing creating a
>>>>>>> reference loop? The files_struct has a reference to the uring file,
>>>>>>> and the uring file has ACCEPT work that has a reference to the
>>>>>>> files_struct. If the task gets killed and the accept work blocks, the
>>>>>>> entire files_struct will stay alive, right?
>>>>>>
>>>>>> Yes, for the lifetime of the request, it does create a loop. So if the
>>>>>> application goes away, I think you're right, the files_struct will stay.
>>>>>> And so will the io_uring, for that matter, as we depend on the closing
>>>>>> of the files to do the final reap.
>>>>>>
>>>>>> Hmm, not sure how best to handle that, to be honest. We need some way to
>>>>>> break the loop, if the request never finishes.
>>>>>
>>>>> A wacky and dubious approach would be to, instead of taking a
>>>>> reference to the files_struct, abuse f_op->flush() to synchronously
>>>>> flush out pending requests with references to the files_struct... But
>>>>> it's probably a bad idea, given that in f_op->flush(), you can't
>>>>> easily tell which files_struct the close is coming from. I suppose you
>>>>> could keep a list of (fdtable, fd) pairs through which ACCEPT requests
>>>>> have come in and then let f_op->flush() probe whether the file
>>>>> pointers are gone from them...
>>>>
>>>> Got back to this after finishing the io-wq stuff, which we need for the
>>>> cancel.
>>>>
>>>> Here's an updated patch:
>>>>
>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=1ea847edc58d6a54ca53001ad0c656da57257570
>>>>
>>>> that seems to work for me (lightly tested), we correctly find and cancel
>>>> work that is holding on to the file table.
>>>>
>>>> The full series sits on top of my for-5.5/io_uring-wq branch, and can be
>>>> viewed here:
>>>>
>>>> http://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-test
>>>>
>>>> Let me know what you think!
>>>
>>> Ah, I didn't realize that the second argument to f_op->flush is a
>>> pointer to the files_struct. That's neat.
>>>
>>>
>>> Security: There is no guarantee that ->flush() will run after the last
>>> io_uring_enter() finishes. You can race like this, with threads A and
>>> B in one process and C in another one:
>>>
>>> A: sends uring fd to C via unix domain socket
>>> A: starts syscall io_uring_enter(fd, ...)
>>> A: calls fdget(fd), takes reference to file
>>> B: starts syscall close(fd)
>>> B: fd table entry is removed
>>> B: f_op->flush is invoked and finds no pending transactions
>>> B: syscall close() returns
>>> A: continues io_uring_enter(), grabbing current->files
>>> A: io_uring_enter() returns
>>> A and B: exit
>>> worker: use-after-free access to files_struct
>>>
>>> I think the solution to this would be (unless you're fine with adding
>>> some broad global read-write mutex) something like this in
>>> __io_queue_sqe(), where "fd" and "f" are the variables from
>>> io_uring_enter(), plumbed through the stack somehow:
>>>
>>> if (req->flags & REQ_F_NEED_FILES) {
>>>     rcu_read_lock();
>>>     spin_lock_irq(&ctx->inflight_lock);
>>>     if (fcheck(fd) == f) {
>>>       list_add(&req->inflight_list,
>>>         &ctx->inflight_list);
>>>       req->work.files = current->files;
>>>       ret = 0;
>>>     } else {
>>>       ret = -EBADF;
>>>     }
>>>     spin_unlock_irq(&ctx->inflight_lock);
>>>     rcu_read_unlock();
>>>     if (ret)
>>>       goto put_req;
>>> }
>>
>> First of all, thanks for the thorough look at this! We already have f
>> available here, it's req->file. And we just made a copy of the sqe, so
>> we have sqe->fd available as well. I fixed this up.
> 
> sqe->fd is the file descriptor we're doing I/O on, not the file
> descriptor of the uring file, right? Same thing for req->file. This
> check only detects whether the fd we're doing I/O on was closed, which
> is irrelevant.

Duh yes, I'm an idiot. Easily fixable, I'll update this for the ring fd.

>>> Security + Correctness: If there is more than one io_wqe, it seems to
>>> me that io_uring_flush() calls io_wq_cancel_work(), which calls
>>> io_wqe_cancel_work(), which may return IO_WQ_CANCEL_OK if the first
>>> request it looks at is pending. In that case, io_wq_cancel_work() will
>>> immediately return, and io_uring_flush() will also immediately return.
>>> It looks like any other requests will continue running?
>>
>> Ah good point, I missed that. We need to keep looping until we get
>> NOTFOUND returned. Fixed as well.
>>
>> Also added cancellation if the task is going away. Here's the
>> incremental patch, I'll resend with the full version.
> [...]
>> +static int io_uring_flush(struct file *file, void *data)
>> +{
>> +       struct io_ring_ctx *ctx = file->private_data;
>> +
>> +       if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
>> +               io_wq_cancel_all(ctx->io_wq);
> 
> Looking at io_wq_cancel_all(), this will just send a signal to the
> task without waiting for anything, right? Isn't that unsafe?

Yes, that's a logic error, we should always do the
io_uring_cancel_files(). Ala:

	io_uring_cancel_files();
	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
		io_wq_cancel_all(ctx->io_wq);

Thanks!

-- 
Jens Axboe

next prev parent reply	other threads:[~2019-10-25  0:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 21:28 [PATCHSET] io_uring: add support for accept(4) Jens Axboe
2019-10-17 21:28 ` [PATCH 1/3] io_uring: add support for async work inheriting files table Jens Axboe
2019-10-18  2:41   ` Jann Horn
2019-10-18 14:01     ` Jens Axboe
2019-10-18 14:34       ` Jann Horn
2019-10-18 14:37         ` Jens Axboe
2019-10-18 14:40           ` Jann Horn
2019-10-18 14:43             ` Jens Axboe
2019-10-18 14:52               ` Jann Horn
2019-10-18 15:00                 ` Jens Axboe
2019-10-18 15:54                   ` Jens Axboe
2019-10-18 16:20                     ` Jann Horn
2019-10-18 16:36                       ` Jens Axboe
2019-10-18 17:05                         ` Jens Axboe
2019-10-18 18:06                           ` Jann Horn
2019-10-18 18:16                             ` Jens Axboe
2019-10-18 18:50                               ` Jann Horn
2019-10-24 19:41                                 ` Jens Axboe
2019-10-24 20:31                                   ` Jann Horn
2019-10-24 22:04                                     ` Jens Axboe
2019-10-24 22:09                                       ` Jens Axboe
2019-10-24 23:13                                       ` Jann Horn
2019-10-25  0:35                                         ` Jens Axboe [this message]
2019-10-25  0:52                                           ` Jens Axboe
2019-10-23 12:04   ` Wolfgang Bumiller
2019-10-23 14:11     ` Jens Axboe
2019-10-17 21:28 ` [PATCH 2/3] net: add __sys_accept4_file() helper Jens Axboe
2019-10-17 21:28 ` [PATCH 3/3] io_uring: add support for IORING_OP_ACCEPT Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=947c74b9-e828-e190-19fc-449c72a20798@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=jannh@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).