From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kvm@vger.kernel.org,
virtualization@lists.linux.dev, linux-kselftest@vger.kernel.org,
"Donald Hunter" <donald.hunter@gmail.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"Neal Cardwell" <ncardwell@google.com>,
"David Ahern" <dsahern@kernel.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
"Shuah Khan" <shuah@kernel.org>,
sdf@fomichev.me, dw@davidwei.uk,
"Jamal Hadi Salim" <jhs@mojatatu.com>,
"Victor Nogueira" <victor@mojatatu.com>,
"Pedro Tammela" <pctammela@mojatatu.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v3 5/6] net: devmem: Implement TX path
Date: Wed, 12 Feb 2025 15:53:23 +0000 [thread overview]
Message-ID: <28343e83-6d93-4002-a691-f8273d4d24a8@gmail.com> (raw)
In-Reply-To: <CAHS8izNOqaFe_40gFh09vdBz6-deWdeGu9Aky-e7E+Wu2qtfdw@mail.gmail.com>
On 2/10/25 21:09, Mina Almasry wrote:
> On Wed, Feb 5, 2025 at 4:20 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 2/3/25 22:39, Mina Almasry wrote:
>> ...
>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>> index bb2b751d274a..3ff8f568c382 100644
>>> --- a/include/linux/skbuff.h
>>> +++ b/include/linux/skbuff.h
>>> @@ -1711,9 +1711,12 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
>> ...
>>> int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
>>> struct iov_iter *from, size_t length);
>>> @@ -1721,12 +1724,14 @@ int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
>>> static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb,
>>> struct msghdr *msg, int len)
>>> {
>>> - return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len);
>>> + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len,
>>> + NULL);
>>
>> Instead of propagating it all the way down and carving a new path, why
>> not reuse the existing infra? You already hook into where ubuf is
>> allocated, you can stash the binding in there. And
>
> It looks like it's not possible to increase the side of ubuf_info at
> all, otherwise the BUILD_BUG_ON in msg_zerocopy_alloc() fires.
>
> It's asserting that sizeof(ubuf_info_msgzc) <= sizeof(skb->cb), and
> I'm guessing increasing skb->cb size is not really the way to go.
>
> What I may be able to do here is stash the binding somewhere in
> ubuf_info_msgzc via union with fields we don't need for devmem, and/or
It doesn't need to account the memory against the user, and you
actually don't want that because dmabuf should take care of that.
So, it should be fine to reuse ->mmp.
It's also not a real sk_buff, so maybe maintainers wouldn't mind
reusing some more space out of it, if that would even be needed.
> stashing the binding in ubuf_info_ops (very hacky). Neither approach
> seems ideal, but the former may work and may be cleaner.
>
> I'll take a deeper look here. I had looked before and concluded that
> we're piggybacking devmem TX on MSG_ZEROCOPY path, because we need
> almost all of the functionality there (no copying, send complete
> notifications, etc), with one minor change in the skb filling. I had
> concluded that if MSG_ZEROCOPY was never updated to use the existing
> infra, then it's appropriate for devmem TX piggybacking on top of it
MSG_ZEROCOPY does use the common infra, i.e. passing ubuf_info,
but doesn't need ->sg_from_iter as zerocopy_fill_skb_from_iter()
and it's what was there first.
> to follow that. I would not want to get into a refactor of
> MSG_ZEROCOPY for no real reason.
>
> But I'll take a deeper look here and see if I can make something
> slightly cleaner work.
>
>> zerocopy_fill_skb_from_devmem can implement ->sg_from_iter,
>> see __zerocopy_sg_from_iter().
>>
>> ...
>>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>>> index f0693707aece..c989606ff58d 100644
>>> --- a/net/core/datagram.c
>>> +++ b/net/core/datagram.c
>>> @@ -63,6 +63,8 @@
>>> +static int
>>> +zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from,
>>> + int length,
>>> + struct net_devmem_dmabuf_binding *binding)
>>> +{
>>> + int i = skb_shinfo(skb)->nr_frags;
>>> + size_t virt_addr, size, off;
>>> + struct net_iov *niov;
>>> +
>>> + while (length && iov_iter_count(from)) {
>>> + if (i == MAX_SKB_FRAGS)
>>> + return -EMSGSIZE;
>>> +
>>> + virt_addr = (size_t)iter_iov_addr(from);
>>
>> Unless I missed it somewhere it needs to check that the iter
>> is iovec based.
>>
>
> How do we end up here with an iterator that is not iovec based? Is the
> user able to trigger that somehow and I missed it?
Hopefully not, but for example io_uring passes bvecs for a number of
requests that can end up in tcp_sendmsg_locked(). Those probably
would work with the current patch, but check the order of some of the
checks it will break. And once io_uring starts passing bvecs for
normal send[msg] requests, it'd definitely be possible. And there
are other in kernel users apart from send(2) path, so who knows.
The api allows it and therefore should be checked, it's better to
avoid quite possible latent bugs.
--
Pavel Begunkov
next prev parent reply other threads:[~2025-02-12 15:52 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-03 22:39 [PATCH net-next v3 0/6] Device memory TCP TX Mina Almasry
2025-02-03 22:39 ` [PATCH net-next v3 1/6] net: add devmem TCP TX documentation Mina Almasry
2025-02-03 22:39 ` [PATCH net-next v3 2/6] selftests: ncdevmem: Implement devmem TCP TX Mina Almasry
2025-02-04 12:29 ` Paolo Abeni
2025-02-04 16:50 ` Jakub Kicinski
2025-02-04 17:35 ` Mina Almasry
2025-02-04 17:56 ` Paolo Abeni
2025-02-04 18:03 ` Mina Almasry
2025-02-04 18:07 ` Stanislav Fomichev
2025-02-03 22:39 ` [PATCH net-next v3 3/6] net: add get_netmem/put_netmem support Mina Almasry
2025-02-03 22:39 ` [PATCH net-next v3 4/6] net: devmem: TCP tx netlink api Mina Almasry
2025-02-03 22:39 ` [PATCH net-next v3 5/6] net: devmem: Implement TX path Mina Almasry
2025-02-04 12:15 ` Paolo Abeni
2025-02-05 12:20 ` Pavel Begunkov
2025-02-10 21:09 ` Mina Almasry
2025-02-12 15:53 ` Pavel Begunkov [this message]
2025-02-12 19:18 ` Mina Almasry
2025-02-13 13:18 ` Pavel Begunkov
2025-02-17 23:26 ` Mina Almasry
2025-02-19 22:41 ` Pavel Begunkov
2025-02-20 1:46 ` Mina Almasry
2025-02-20 14:35 ` Pavel Begunkov
2025-02-05 21:56 ` Willem de Bruijn
2025-02-03 22:39 ` [PATCH net-next v3 6/6] net: devmem: make dmabuf unbinding scheduled work Mina Almasry
2025-02-04 12:32 ` [PATCH net-next v3 0/6] Device memory TCP TX Paolo Abeni
2025-02-04 17:27 ` Mina Almasry
2025-02-04 18:06 ` Stanislav Fomichev
2025-02-04 18:32 ` Paolo Abeni
2025-02-04 18:47 ` Mina Almasry
2025-02-04 19:41 ` Stanislav Fomichev
2025-02-05 2:06 ` Jakub Kicinski
2025-02-05 19:53 ` Mina Almasry
2025-02-04 18:38 ` Mina Almasry
2025-02-04 19:43 ` Stanislav Fomichev
2025-02-05 0:47 ` Samiullah Khawaja
2025-02-05 1:05 ` Stanislav Fomichev
2025-02-05 2:08 ` Jakub Kicinski
2025-02-05 19:52 ` Mina Almasry
2025-02-06 1:45 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28343e83-6d93-4002-a691-f8273d4d24a8@gmail.com \
--to=asml.silence@gmail.com \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dsahern@kernel.org \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=jhs@mojatatu.com \
--cc=kaiyuanz@google.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mst@redhat.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=sdf@fomichev.me \
--cc=sgarzare@redhat.com \
--cc=shuah@kernel.org \
--cc=skhawaja@google.com \
--cc=stefanha@redhat.com \
--cc=victor@mojatatu.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox