From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Kuniyuki Iwashima" <kuniyu@amazon.com>
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
horms@kernel.org, kuba@kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, netdev@vger.kernel.org,
pabeni@redhat.com, shuah@kernel.org,
willemdebruijn.kernel@gmail.com
Subject: Re: [RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP
Date: Thu, 01 May 2025 06:22:17 +0000 [thread overview]
Message-ID: <1f4d3fb4eed397e346efb3ef597e29204e5a2f4b@linux.dev> (raw)
In-Reply-To: <20250501044321.83028-1-kuniyu@amazon.com>
2025/5/1 12:42, "Kuniyuki Iwashima" <kuniyu@amazon.com> wrote:
>
> From: Jiayuan Chen <jiayuan.chen@linux.dev>
>
> Date: Thu, 1 May 2025 11:51:08 +0800
>
> >
> > For some services we are using "established-over-unconnected" model.
> >
> >
> >
> > '''
> >
> > // create unconnected socket and 'listen()'
> >
> > srv_fd = socket(AF_INET, SOCK_DGRAM)
> >
> > setsockopt(srv_fd, SO_REUSEPORT)
> >
> > bind(srv_fd, SERVER_ADDR, SERVER_PORT)
> >
> >
> >
> > // 'accept()'
> >
> > data, client_addr = recvmsg(srv_fd)
> >
> >
> >
> > // create a connected socket for this request
> >
> > cli_fd = socket(AF_INET, SOCK_DGRAM)
> >
> > setsockopt(cli_fd, SO_REUSEPORT)
> >
> > bind(cli_fd, SERVER_ADDR, SERVER_PORT)
> >
> > connect(cli, client_addr)
> >
> > ...
> >
> > // do handshake with cli_fd
> >
> > '''
> >
> >
> >
> > This programming pattern simulates accept() using UDP, creating a new
> >
> > socket for each client request. The server can then use separate sockets
> >
> > to handle client requests, avoiding the need to use a single UDP socket
> >
> > for I/O transmission.
> >
> >
> >
> > But there is a race condition between the bind() and connect() of the
> >
> > connected socket:
> >
> > We might receive unexpected packets belonging to the unconnected socket
> >
> > before connect() is executed, which is not what we need.
> >
> > (Of course, before connect(), the unconnected socket will also receive
> >
> > packets from the connected socket, which is easily resolved because
> >
> > upper-layer protocols typically require explicit boundaries, and we
> >
> > receive a complete packet before creating a connected socket.)
> >
> >
> >
> > Before this patch, the connected socket had to filter requests at recvmsg
> >
> > time, acting as a dispatcher to some extent. With this patch, we can
> >
> > consider the bind and connect operations to be atomic.
> >
>
> SO_ATTACH_REUSEPORT_EBPF is what you want.
>
> The socket won't receive any packets until the socket is added to
>
> the BPF map.
>
> No need to reinvent a subset of BPF functionalities.
>
I think this feature is for selecting one socket, not filtering out certain
sockets.
Does this mean that I need to first capture all sockets bound to the same
port, and then if the kernel selects a socket that I don't want to receive
packets on, I'll need to implement an algorithm in the BPF program to
choose another socket from the ones I've captured, in order to avoid
returning that socket?
This looks like it completely bypasses the kernel's built-in scoring
logic. Or is expanding BPF_PROG_TYPE_SK_REUSEPORT to have filtering
capabilities also an acceptable solution?
next prev parent reply other threads:[~2025-05-01 6:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 3:51 [RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP Jiayuan Chen
2025-05-01 3:51 ` [RFC net-next v1 2/2] selftests/net: Add udp UDP_STOP_RCV selftest Jiayuan Chen
2025-05-01 4:42 ` [RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP Kuniyuki Iwashima
2025-05-01 6:22 ` Jiayuan Chen [this message]
2025-05-01 7:12 ` Kuniyuki Iwashima
2025-05-01 14:27 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f4d3fb4eed397e346efb3ef597e29204e5a2f4b@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).