virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, virtualization@lists.linux.dev,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Donald Hunter" <donald.hunter@gmail.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David Ahern" <dsahern@kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Joe Damato" <jdamato@fastly.com>,
	dw@davidwei.uk
Subject: Re: [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path
Date: Wed, 5 Feb 2025 22:22:31 +0000	[thread overview]
Message-ID: <76880ee8-d5ce-458d-b165-c11ce1a23c76@gmail.com> (raw)
In-Reply-To: <88cb8f03-7976-4846-a74d-e2d234c5cf8d@gmail.com>

On 2/5/25 22:16, Pavel Begunkov wrote:
> On 2/5/25 20:22, Mina Almasry wrote:
>> On Wed, Feb 5, 2025 at 4:41 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>
>>> On 1/28/25 14:49, Willem de Bruijn wrote:
>>>>>>> +struct net_devmem_dmabuf_binding *
>>>>>>> +net_devmem_get_sockc_binding(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>> +{
>>>>>>> +     struct net_devmem_dmabuf_binding *binding;
>>>>>>> +     int err = 0;
>>>>>>> +
>>>>>>> +     binding = net_devmem_lookup_dmabuf(sockc->dmabuf_id);
>>>>>>
>>>>>> This lookup is from global xarray net_devmem_dmabuf_bindings.
>>>>>>
>>>>>> Is there a check that the socket is sending out through the device
>>>>>> to which this dmabuf was bound with netlink? Should there be?
>>>>>> (e.g., SO_BINDTODEVICE).
>>>>>>
>>>>>
>>>>> Yes, I think it may be an issue if the user triggers a send from a
>>>>> different netdevice, because indeed when we bind a dmabuf we bind it
>>>>> to a specific netdevice.
>>>>>
>>>>> One option is as you say to require TX sockets to be bound and to
>>>>> check that we're bound to the correct netdev. I also wonder if I can
>>>>> make this work without SO_BINDTODEVICE, by querying the netdev the
>>>>> sock is currently trying to send out on and doing a check in the
>>>>> tcp_sendmsg. I'm not sure if this is possible but I'll give it a look.
>>>>
>>>> I was a bit quick on mentioning SO_BINDTODEVICE. Agreed that it is
>>>> vastly preferable to not require that, but infer the device from
>>>> the connected TCP sock.
>>>
>>> I wonder why so? I'd imagine something like SO_BINDTODEVICE is a
>>> better way to go. The user has to do it anyway, otherwise packets
>>> might go to a different device and the user would suddenly start
>>> getting errors with no good way to alleviate them (apart from
>>> likes of SO_BINDTODEVICE). It's even worse if it works for a while
>>> but starts to unpredictably fail as time passes. With binding at
>>> least it'd fail fast if the setup is not done correctly.
>>>
>>
>> I think there may be a misunderstanding. There is nothing preventing
>> the user from SO_BINDTODEVICE to make sure the socket is bound to the
> 
> Right, not arguing otherwise
> 
>> ifindex, and the test changes in the latest series actually do this
>> binding.
>>
>> It's just that on TX, we check what device we happen to be going out
>> over, and fail if we're going out of a different device.
>>
>> There are setups where the device will always be correct even without
>> SO_BINDTODEVICE. Like if the host has only 1 interface or if the
>> egress IP is only reachable over 1 interface. I don't see much reason
>> to require the user to SO_BINDTODEVICE in these cases.
> 
> That's exactly the problem. People would test their code with one setup
> where it works just fine, but then there will be a rare user of a
> library used by some other framework or a lonely server where it starts
> to fails for no apparent reason while "it worked before and nothing has
> changed". It's more predictable if enforced.
> 
> I don't think we'd care about setup overhead one extra ioctl() here(?),
> but with this option we'd need to be careful about not racing with
> rebinding, if it's allowed.

FWIW, it's surely not a big deal, but it makes a clearer api.
Hence my curiosity what are the other reasons.

-- 
Pavel Begunkov


  reply	other threads:[~2025-02-05 22:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-21  0:42 [PATCH RFC net-next v1 0/5] Device memory TCP TX Mina Almasry
2024-12-21  0:42 ` [PATCH RFC net-next v1 1/5] net: add devmem TCP TX documentation Mina Almasry
2024-12-21  4:56   ` Stanislav Fomichev
2025-01-27 22:45     ` Mina Almasry
2025-01-28  3:51       ` Stanislav Fomichev
2024-12-21  0:42 ` [PATCH RFC net-next v1 2/5] selftests: ncdevmem: Implement devmem TCP TX Mina Almasry
2024-12-21  4:57   ` Stanislav Fomichev
2024-12-26 21:24   ` Willem de Bruijn
2024-12-21  0:42 ` [PATCH RFC net-next v1 3/5] net: add get_netmem/put_netmem support Mina Almasry
2024-12-26 19:07   ` Stanislav Fomichev
2025-01-27 22:47     ` Mina Almasry
2024-12-21  0:42 ` [PATCH RFC net-next v1 4/5] net: devmem TCP tx netlink api Mina Almasry
2024-12-21  0:42 ` [PATCH RFC net-next v1 5/5] net: devmem: Implement TX path Mina Almasry
2024-12-21  5:09   ` Stanislav Fomichev
2024-12-26 19:10     ` Stanislav Fomichev
2025-01-27 22:52       ` Mina Almasry
2024-12-26 21:52   ` Willem de Bruijn
2025-01-28  0:06     ` Mina Almasry
2025-01-28 14:49       ` Willem de Bruijn
2025-02-05 12:41         ` Pavel Begunkov
2025-02-05 20:22           ` Mina Almasry
2025-02-05 22:16             ` Pavel Begunkov
2025-02-05 22:22               ` Pavel Begunkov [this message]
2025-02-10 21:14                 ` Mina Almasry
2024-12-28 19:28   ` David Ahern
2024-12-21  4:53 ` [PATCH RFC net-next v1 0/5] Device memory TCP TX Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76880ee8-d5ce-458d-b165-c11ce1a23c76@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=jdamato@fastly.com \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=sgarzare@redhat.com \
    --cc=shuah@kernel.org \
    --cc=skhawaja@google.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).