From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [BUG][RFC] Miscalculated inflight counter in io_uring
Date: Fri, 25 Oct 2019 10:18:02 -0600 [thread overview]
Message-ID: <f8cc2933-8392-ee64-6356-5c5a39bc973e@kernel.dk> (raw)
In-Reply-To: <360f9685-084f-b174-ccee-5bfe92d0ad3a@gmail.com>
On 10/25/19 10:08 AM, Pavel Begunkov wrote:
> On 25/10/2019 18:32, Jens Axboe wrote:
>> On 10/25/19 3:48 AM, Pavel Begunkov wrote:
>>> In case of IORING_SETUP_SQPOLL | IORING_SETUP_IOPOLL:
>>>
>>> @inflight count returned by io_submit_sqes() is the number of entries
>>> picked up from a sqring including already completed/failed. And
>>> completed but not failed requests will be placed into @poll_list.
>>>
>>> Then io_sq_thread() tries to poll @inflight events, even though failed
>>> won't appear in @poll_list. Thus, it will think that there are always
>>> something to poll (i.e. @inflight > 0)
>>>
>>> There are several issues with this:
>>> 1. io_sq_thread won't ever sleep
>>> 2. io_sq_thread() may be left running and actively polling even after
>>> user process is destroyed
>>> 3. the same goes for mm_struct with all vmas of the user process
>>> TL;DR;
>>> awhile @inflight > 0, io_sq_thread won't put @cur_mm, so locking
>>> recycling of vmas used for rings' mapping, which hold refcount of
>>> io_uring's struct file. Thus, io_uring_release() won't be called, as
>>> well as kthread_{park,stop}(). That's all in case when the user process
>>> haven't unmapped rings.
>>>
>>>
>>> I'm not sure how to fix it better:
>>> 1. try to put failed into poll_list (grabbing mutex).
>>>
>>> 2. test for zero-inflight case with comparing sq and cq. something like
>>> ```
>>> if (nr_polled == 0) {
>>> lock(comp_lock);
>>> if (cached_cq_head == cached_sq_tail)
>>> inflight = 0;
>>> unlock(comp_lock);
>>> }
>>> ```
>>> But that's adds extra spinlock locking in fast-path. And that's unsafe
>>> to use non-cached heads/tails, as it could be maliciously changed by
>>> userspace.
>>>
>>> 3. Do some counting of failed (probably needs atomic or synchronisation)
>>>
>>> 4. something else?
>>
>> Can we just look at the completion count? Ala:
>>
>> prev_tail = ctx->cached_cq_tail;
>> inflight += io_submit_sqes(ctx, to_submit, cur_mm != NULL,
>> mm_fault);
>> if (prev_tail != ctx->cached_cq_tail)
>> inflight -= (ctx->cached_cq_tail - prev_tail);
>>
>> or something like that.
>>
>
> Don't think so:
> 1. @cached_cq_tail is protected be @completion_lock. (right?)
> Never know what happens, when you violate a memory model.
> 2. if something is successfully completed by that time, we again get
> the wrong number.
>
> Basically, it's
> inflight = (cached_sq_head - cached_cq_tail) + len(poll_list)
> maybe you can figure out something from this.
>
> idea 1:
> How about to count failed events and subtract it?
> But as they may fail asynchronously need synchronisation
> e.g. atomic_add() for fails (fail, slow-path)
> and atomic_load() in kthread (fast-path)
How about we just go way simpler? We only really care about if we have
any inflight requests for polling, or none at all. We don't care about
the count of them. So if we check the poll_list, and if it's empty, then
we just reset inflight. That should handle it without any extra weird
accounting, or racy logic.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9c128df3d675..afc463a06fdc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -874,19 +874,12 @@ static void io_iopoll_reap_events(struct io_ring_ctx *ctx)
mutex_unlock(&ctx->uring_lock);
}
-static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned *nr_events,
- long min)
+static int __io_iopoll_check(struct io_ring_ctx *ctx, unsigned *nr_events,
+ long min)
{
- int iters, ret = 0;
-
- /*
- * We disallow the app entering submit/complete with polling, but we
- * still need to lock the ring to prevent racing with polled issue
- * that got punted to a workqueue.
- */
- mutex_lock(&ctx->uring_lock);
+ int iters, ret;
- iters = 0;
+ ret = iters = 0;
do {
int tmin = 0;
@@ -922,6 +915,21 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned *nr_events,
ret = 0;
} while (min && !*nr_events && !need_resched());
+ return ret;
+}
+
+static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned *nr_events,
+ long min)
+{
+ int ret;
+
+ /*
+ * We disallow the app entering submit/complete with polling, but we
+ * still need to lock the ring to prevent racing with polled issue
+ * that got punted to a workqueue.
+ */
+ mutex_lock(&ctx->uring_lock);
+ ret = __io_iopoll_check(ctx, nr_events, min);
mutex_unlock(&ctx->uring_lock);
return ret;
}
@@ -2658,7 +2666,12 @@ static int io_sq_thread(void *data)
unsigned nr_events = 0;
if (ctx->flags & IORING_SETUP_IOPOLL) {
- io_iopoll_check(ctx, &nr_events, 0);
+ mutex_lock(&ctx->uring_lock);
+ if (!list_empty(&ctx->poll_list))
+ __io_iopoll_check(ctx, &nr_events, 0);
+ else
+ inflight = 0;
+ mutex_unlock(&ctx->uring_lock);
} else {
/*
* Normal IO, just pretend everything completed.
--
Jens Axboe
prev parent reply other threads:[~2019-10-25 16:18 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-25 9:48 [BUG][RFC] Miscalculated inflight counter in io_uring Pavel Begunkov
2019-10-25 15:32 ` Jens Axboe
2019-10-25 16:08 ` Pavel Begunkov
2019-10-25 16:18 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8cc2933-8392-ee64-6356-5c5a39bc973e@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).