From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
io-uring@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org, Anuj Gupta <anuj20.g@samsung.com>,
Kanchan Joshi <joshi.k@samsung.com>
Subject: Re: [PATCH v5 3/5] io_uring: count CQEs in io_iopoll_check()
Date: Sat, 7 Mar 2026 10:35:27 +0800 [thread overview]
Message-ID: <aauO72ocnZRhJkiA@fedora> (raw)
In-Reply-To: <CADUfDZp0shyZ5FqfEcwbi0tHXOFqwqZKRvwQW=heR-yvaOaw0Q@mail.gmail.com>
On Fri, Mar 06, 2026 at 05:38:15PM -0800, Caleb Sander Mateos wrote:
> On Wed, Mar 4, 2026 at 8:29 AM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 3/4/26 8:46 AM, Caleb Sander Mateos wrote:
> > > On Wed, Mar 4, 2026 at 2:33?AM Ming Lei <ming.lei@redhat.com> wrote:
> > >>
> > >> On Mon, Mar 02, 2026 at 10:29:12AM -0700, Caleb Sander Mateos wrote:
> > >>> A subsequent commit will allow uring_cmds that don't use iopoll on
> > >>> IORING_SETUP_IOPOLL io_urings. As a result, CQEs can be posted without
> > >>> setting the iopoll_completed flag for a request in iopoll_list or going
> > >>> through task work. For example, a UBLK_U_IO_FETCH_IO_CMDS command could
> > >>> call io_uring_mshot_cmd_post_cqe() to directly post a CQE. The
> > >>> io_iopoll_check() loop currently only counts completions posted in
> > >>> io_do_iopoll() when determining whether the min_events threshold has
> > >>> been met. It also exits early if there are any existing CQEs before
> > >>> polling, or if any CQEs are posted while running task work. CQEs posted
> > >>> via io_uring_mshot_cmd_post_cqe() or other mechanisms won't be counted
> > >>> against min_events.
> > >>>
> > >>> Explicitly check the available CQEs in each io_iopoll_check() loop
> > >>> iteration to account for CQEs posted in any fashion.
> > >>>
> > >>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > >>> ---
> > >>> io_uring/io_uring.c | 9 ++-------
> > >>> 1 file changed, 2 insertions(+), 7 deletions(-)
> > >>>
> > >>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > >>> index 46f39831d27c..b4625695bb3a 100644
> > >>> --- a/io_uring/io_uring.c
> > >>> +++ b/io_uring/io_uring.c
> > >>> @@ -1184,11 +1184,10 @@ __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
> > >>> io_move_task_work_from_local(ctx);
> > >>> }
> > >>>
> > >>> static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
> > >>> {
> > >>> - unsigned int nr_events = 0;
> > >>> unsigned long check_cq;
> > >>>
> > >>> min_events = min(min_events, ctx->cq_entries);
> > >>>
> > >>> lockdep_assert_held(&ctx->uring_lock);
> > >>> @@ -1227,34 +1226,30 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
> > >>> * the poll to the issued list. Otherwise we can spin here
> > >>> * forever, while the workqueue is stuck trying to acquire the
> > >>> * very same mutex.
> > >>> */
> > >>> if (list_empty(&ctx->iopoll_list) || io_task_work_pending(ctx)) {
> > >>> - u32 tail = ctx->cached_cq_tail;
> > >>> -
> > >>> (void) io_run_local_work_locked(ctx, min_events);
> > >>>
> > >>> if (task_work_pending(current) || list_empty(&ctx->iopoll_list)) {
> > >>> mutex_unlock(&ctx->uring_lock);
> > >>> io_run_task_work();
> > >>> mutex_lock(&ctx->uring_lock);
> > >>> }
> > >>> /* some requests don't go through iopoll_list */
> > >>> - if (tail != ctx->cached_cq_tail || list_empty(&ctx->iopoll_list))
> > >>> + if (list_empty(&ctx->iopoll_list))
> > >>> break;
> > >>> }
> > >>> ret = io_do_iopoll(ctx, !min_events);
> > >>> if (unlikely(ret < 0))
> > >>> return ret;
> > >>>
> > >>> if (task_sigpending(current))
> > >>> return -EINTR;
> > >>> if (need_resched())
> > >>> break;
> > >>> -
> > >>> - nr_events += ret;
> > >>> - } while (nr_events < min_events);
> > >>> + } while (io_cqring_events(ctx) < min_events);
> > >>
> > >> Before entering the loop, if io_cqring_events() finds any queued CQE,
> > >> io_iopoll_check() returns immediately without polling.
> > >>
> > >> If the queued CQE is originated from non-iopoll uring_cmd, iopoll request
> > >> will not be polled, may this be one issue?
> > >
> > > I also noticed that logic and thought it seemed odd. I would think
> > > we'd always want to wait for min_events CQEs (and iopoll once even if
> > > min_events is 0). Looks like Jens added the early return in commit
> > > a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs
> > > pending"), perhaps he can shed some light on it?
> >
> > I don't recall the bug in question, it's been a while... But it always
> > makes sense to return events that are ready, and skip polling. It should
> > only be done if there are no ready events to reap.
>
> Ming, are you okay with preserving that behavior in this patch then? I
> guess there's a potential fairness concern where REQ_F_IOPOLL requests
> may not be polled for some time if non-REQ_F_IOPOLL requests continue
> to frequently post CQEs.
IMO, the fairness may not a big deal given userspace should keep polling
if the iopoll IO isn't done.
But forget to mention, if non-iopoll CQE is posted and ->cq_flush becomes
true, io_submit_flush_completions() may not get chance to run in case of
the early return.
Maybe something like below is needed:
@@ -1556,8 +1556,10 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
* If we do, we can potentially be spinning for commands that
* already triggered a CQE (eg in error).
*/
- if (io_cqring_events(ctx))
+ if (io_cqring_events(ctx)) {
+ io_submit_flush_completions(ctx);
return 0;
+ }
Thanks,
Ming
next prev parent reply other threads:[~2026-03-07 2:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 17:29 [PATCH v5 0/5] io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL Caleb Sander Mateos
2026-03-02 17:29 ` [PATCH v5 1/5] io_uring: add REQ_F_IOPOLL Caleb Sander Mateos
2026-03-02 17:29 ` [PATCH v5 2/5] io_uring: remove iopoll_queue from struct io_issue_def Caleb Sander Mateos
2026-03-02 17:29 ` [PATCH v5 3/5] io_uring: count CQEs in io_iopoll_check() Caleb Sander Mateos
2026-03-04 10:32 ` Ming Lei
2026-03-04 15:46 ` Caleb Sander Mateos
2026-03-04 16:29 ` Jens Axboe
2026-03-07 1:38 ` Caleb Sander Mateos
2026-03-07 2:35 ` Ming Lei [this message]
2026-03-16 22:11 ` Jens Axboe
2026-03-16 23:39 ` Caleb Sander Mateos
2026-03-02 17:29 ` [PATCH v5 4/5] io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL Caleb Sander Mateos
2026-03-02 17:29 ` [PATCH v5 5/5] nvme: remove nvme_dev_uring_cmd() IO_URING_F_IOPOLL check Caleb Sander Mateos
2026-03-16 22:14 ` [PATCH v5 0/5] io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL Jens Axboe
2026-03-17 1:01 ` Jens Axboe
2026-03-18 0:47 ` Caleb Sander Mateos
2026-03-18 1:26 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aauO72ocnZRhJkiA@fedora \
--to=ming.lei@redhat.com \
--cc=anuj20.g@samsung.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.