From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, io-uring@vger.kernel.org
Subject: Re: [PATCH 2/3] io_uring/msg_ring: avoid double indirection task_work for data messages
Date: Tue, 28 May 2024 08:23:58 -0600 [thread overview]
Message-ID: <39bc9945-149f-4e48-91fa-9bec19eb74f9@kernel.dk> (raw)
In-Reply-To: <a9988b65-2a66-4af8-9fb4-ed7648d96b58@gmail.com>
On 5/28/24 7:18 AM, Pavel Begunkov wrote:
> On 5/24/24 23:58, Jens Axboe wrote:
>> If IORING_SETUP_SINGLE_ISSUER is set, then we can't post CQEs remotely
>> to the target ring. Instead, task_work is queued for the target ring,
>> which is used to post the CQE. To make matters worse, once the target
>> CQE has been posted, task_work is then queued with the originator to
>> fill the completion.
>>
>> This obviously adds a bunch of overhead and latency. Instead of relying
>> on generic kernel task_work for this, fill an overflow entry on the
>> target ring and flag it as such that the target ring will flush it. This
>> avoids both the task_work for posting the CQE, and it means that the
>> originator CQE can be filled inline as well.
>>
>> In local testing, this reduces the latency on the sender side by 5-6x.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>> io_uring/msg_ring.c | 77 +++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 74 insertions(+), 3 deletions(-)
>>
>> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
>> index feff2b0822cf..3f89ff3a40ad 100644
>> --- a/io_uring/msg_ring.c
>> +++ b/io_uring/msg_ring.c
>> @@ -123,6 +123,69 @@ static void io_msg_tw_complete(struct callback_head *head)
>> io_req_queue_tw_complete(req, ret);
>> }
>> +static struct io_overflow_cqe *io_alloc_overflow(struct io_ring_ctx *target_ctx)
>> +{
>> + bool is_cqe32 = target_ctx->flags & IORING_SETUP_CQE32;
>> + size_t cqe_size = sizeof(struct io_overflow_cqe);
>> + struct io_overflow_cqe *ocqe;
>> +
>> + if (is_cqe32)
>> + cqe_size += sizeof(struct io_uring_cqe);
>> +
>> + ocqe = kmalloc(cqe_size, GFP_ATOMIC | __GFP_ACCOUNT);
>> + if (!ocqe)
>> + return NULL;
>> +
>> + if (is_cqe32)
>> + ocqe->cqe.big_cqe[0] = ocqe->cqe.big_cqe[1] = 0;
>> +
>> + return ocqe;
>> +}
>> +
>> +/*
>> + * Entered with the target uring_lock held, and will drop it before
>> + * returning. Adds a previously allocated ocqe to the overflow list on
>> + * the target, and marks it appropriately for flushing.
>> + */
>> +static void io_msg_add_overflow(struct io_msg *msg,
>> + struct io_ring_ctx *target_ctx,
>> + struct io_overflow_cqe *ocqe, int ret)
>> + __releases(target_ctx->uring_lock)
>> +{
>> + spin_lock(&target_ctx->completion_lock);
>> +
>> + if (list_empty(&target_ctx->cq_overflow_list)) {
>> + set_bit(IO_CHECK_CQ_OVERFLOW_BIT, &target_ctx->check_cq);
>> + atomic_or(IORING_SQ_TASKRUN, &target_ctx->rings->sq_flags);
>
> TASKRUN? The normal overflow path sets IORING_SQ_CQ_OVERFLOW
Was a bit split on it - we want it run as part of waiting, but I also
wasn't super interested in exposing it as an overflow condition since it
is now. It's more of an internal implementation detail.
--
Jens Axboe
next prev parent reply other threads:[~2024-05-28 14:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-24 22:58 [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance Jens Axboe
2024-05-24 22:58 ` [PATCH 1/3] io_uring/msg_ring: split fd installing into a helper Jens Axboe
2024-05-24 22:58 ` [PATCH 2/3] io_uring/msg_ring: avoid double indirection task_work for data messages Jens Axboe
2024-05-28 13:18 ` Pavel Begunkov
2024-05-28 14:23 ` Jens Axboe [this message]
2024-05-28 13:32 ` Pavel Begunkov
2024-05-28 14:23 ` Jens Axboe
2024-05-28 16:23 ` Pavel Begunkov
2024-05-28 17:59 ` Jens Axboe
2024-05-29 2:04 ` Pavel Begunkov
2024-05-29 2:43 ` Jens Axboe
2024-05-24 22:58 ` [PATCH 3/3] io_uring/msg_ring: avoid double indirection task_work for fd passing Jens Axboe
2024-05-28 13:31 ` [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance Pavel Begunkov
2024-05-28 14:34 ` Jens Axboe
2024-05-28 14:39 ` Jens Axboe
2024-05-28 15:27 ` Jens Axboe
2024-05-28 16:50 ` Pavel Begunkov
2024-05-28 18:07 ` Jens Axboe
2024-05-28 18:31 ` Jens Axboe
2024-05-28 23:04 ` Jens Axboe
2024-05-29 1:35 ` Jens Axboe
2024-05-29 2:08 ` Pavel Begunkov
2024-05-29 2:42 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39bc9945-149f-4e48-91fa-9bec19eb74f9@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.