From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
Dylan Yudaken <dyudaken@gmail.com>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH 8/8] io_uring/net: set MSG_MORE if we're doing multishot send and have more
Date: Mon, 26 Feb 2024 07:52:46 -0700 [thread overview]
Message-ID: <bb41b293-4b7d-4d12-a991-8b76ed057cee@kernel.dk> (raw)
In-Reply-To: <e0a32c54-07bc-4f12-ae76-021a4d17e84f@gmail.com>
On 2/26/24 7:24 AM, Pavel Begunkov wrote:
> On 2/26/24 13:42, Jens Axboe wrote:
>> On 2/26/24 3:59 AM, Dylan Yudaken wrote:
>>> On Sun, Feb 25, 2024 at 12:46?AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> If we have more data pending, we know we're going to do one more loop.
>>>> If that's the case, then set MSG_MORE to inform the networking stack
>>>> that there's more data coming shortly for this socket.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>> io_uring/net.c | 10 +++++++---
>>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/io_uring/net.c b/io_uring/net.c
>>>> index 240b8eff1a78..07307dd5a077 100644
>>>> --- a/io_uring/net.c
>>>> +++ b/io_uring/net.c
>>>> @@ -519,6 +519,10 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (!io_check_multishot(req, issue_flags))
>>>> return io_setup_async_msg(req, kmsg, issue_flags);
>>>>
>>>> + flags = sr->msg_flags;
>>>> + if (issue_flags & IO_URING_F_NONBLOCK)
>>>> + flags |= MSG_DONTWAIT;
>>>> +
>>>> retry_multishot:
>>>> if (io_do_buffer_select(req)) {
>>>> void __user *buf;
>>>> @@ -528,12 +532,12 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (!buf)
>>>> return -ENOBUFS;
>>>>
>>>> + if ((req->flags & (REQ_F_BL_EMPTY|REQ_F_APOLL_MULTISHOT)) ==
>>>> + REQ_F_APOLL_MULTISHOT)
>>>> + flags |= MSG_MORE;
>>>> iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_SOURCE, buf, len);
>>>> }
>>>
>>> This feels racy. I don't have an exact sequence in mind, but I believe
>>> there are cases where between
>>> the two calls to __sys_sendmsg_sock, another submission could be
>>> issued and drain the buffer list.
>>> I guess the result would be that the packet is never sent out, but I
>>> have not followed the codepaths of MSG_MORE.
>>
>> This is true, but that race always exists depending on how gets to go
>> first (the adding of the buffer, or the send itself). The way I see it,
>> when the send is issued we're making the guarantee that we're going to
>> at least deplete the queue as it looks when entered. If more is added
>> while it's being processed, we _may_ see it.
>>
>> Outside of that, we don't want it to potentially run in perpetuity. It
>> may actually be a good idea to make the rule of "just issue what was
>> there when first seen/issued" a hard one, though I don't think it's
>> really worth doing. But making any guarantees on buffers added in
>> parallel will be impossible. If you do that, then you have to deal with
>> figuring out what's left in the queue once you get a completion withou
>> CQE_F_MORE.
>>
>>> The obvious other way to trigger this codepath is if the user messes
>>> with the ring by decrementing
>>> the buffer counter. I do not believe there are any nefarious outcomes
>>> - but just to point out that
>>> REQ_F_BL_EMPTY is essentially user controlled.
>>
>> The user may certainly shoot himself in the foot. As long as that
>> doesn't lead to a nefarious outcome, then that's not a concern. For this
>> case, the head is kernel local, user can only write to the tail. So we
>> could have a case of user fiddling with the tail and when we grab the
>> next buffer (and the previous one did not have REQ_F_BL_EMPTY set), the
>> ring will indeed appear to be empty. At that point you get an -ENOBUFS
>> without CQE_F_MORE set.
>
> A side note, don't forget that there are other protocols apart
> from TCP. AFAIK UDP corking will pack it into a single datagram,
> which is not the same as two separate sends.
Yeah, should really have labeled this one as a test/rfc kind of patch. I
wasn't even convinced we want to do this uncondtionally for TCP. I'll
just leave it at the end for now, it's a separate kind of discussion
imho and this is why it was left as a separate patch rather than being
bundled with the multishot send in general.
--
Jens Axboe
prev parent reply other threads:[~2024-02-26 14:52 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-25 0:35 [PATCHSET v3 0/8] Support for provided buffers for send Jens Axboe
2024-02-25 0:35 ` [PATCH 1/8] io_uring/net: unify how recvmsg and sendmsg copy in the msghdr Jens Axboe
2024-02-26 14:41 ` Pavel Begunkov
2024-02-26 15:03 ` Jens Axboe
2024-02-25 0:35 ` [PATCH 2/8] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
2024-02-25 0:35 ` [PATCH 3/8] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
2024-02-25 0:35 ` [PATCH 4/8] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
2024-02-25 0:35 ` [PATCH 5/8] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
2024-02-25 0:35 ` [PATCH 6/8] io_uring/net: support multishot for send Jens Axboe
2024-02-26 10:47 ` Dylan Yudaken
2024-02-26 13:38 ` Jens Axboe
2024-02-26 14:02 ` Dylan Yudaken
2024-02-26 14:27 ` Jens Axboe
2024-02-26 14:36 ` Pavel Begunkov
2024-02-26 15:16 ` Jens Axboe
2024-02-26 15:41 ` Pavel Begunkov
2024-02-26 19:11 ` Jens Axboe
2024-02-26 19:21 ` Pavel Begunkov
2024-02-26 20:12 ` Jens Axboe
2024-02-26 20:51 ` Pavel Begunkov
2024-02-26 21:27 ` Jens Axboe
2024-02-28 12:39 ` Pavel Begunkov
2024-02-28 17:28 ` Jens Axboe
2024-02-28 23:49 ` Jens Axboe
2024-02-29 1:46 ` Jens Axboe
2024-02-29 15:42 ` Jens Axboe
2024-02-26 19:31 ` Dylan Yudaken
2024-02-26 19:49 ` Jens Axboe
2024-02-25 0:35 ` [PATCH 7/8] io_uring/net: support multishot for sendmsg Jens Axboe
2024-02-25 0:35 ` [PATCH 8/8] io_uring/net: set MSG_MORE if we're doing multishot send and have more Jens Axboe
2024-02-26 10:59 ` Dylan Yudaken
2024-02-26 13:42 ` Jens Axboe
2024-02-26 14:24 ` Pavel Begunkov
2024-02-26 14:52 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bb41b293-4b7d-4d12-a991-8b76ed057cee@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=dyudaken@gmail.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.