From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: netdev@vger.kernel.org
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Shuah Khan <shuah@kernel.org>,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: [RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP
Date: Thu, 1 May 2025 11:51:08 +0800 [thread overview]
Message-ID: <20250501035116.69391-1-jiayuan.chen@linux.dev> (raw)
For some services we are using "established-over-unconnected" model.
'''
// create unconnected socket and 'listen()'
srv_fd = socket(AF_INET, SOCK_DGRAM)
setsockopt(srv_fd, SO_REUSEPORT)
bind(srv_fd, SERVER_ADDR, SERVER_PORT)
// 'accept()'
data, client_addr = recvmsg(srv_fd)
// create a connected socket for this request
cli_fd = socket(AF_INET, SOCK_DGRAM)
setsockopt(cli_fd, SO_REUSEPORT)
bind(cli_fd, SERVER_ADDR, SERVER_PORT)
connect(cli, client_addr)
...
// do handshake with cli_fd
'''
This programming pattern simulates accept() using UDP, creating a new
socket for each client request. The server can then use separate sockets
to handle client requests, avoiding the need to use a single UDP socket
for I/O transmission.
But there is a race condition between the bind() and connect() of the
connected socket:
We might receive unexpected packets belonging to the unconnected socket
before connect() is executed, which is not what we need.
(Of course, before connect(), the unconnected socket will also receive
packets from the connected socket, which is easily resolved because
upper-layer protocols typically require explicit boundaries, and we
receive a complete packet before creating a connected socket.)
Before this patch, the connected socket had to filter requests at recvmsg
time, acting as a dispatcher to some extent. With this patch, we can
consider the bind and connect operations to be atomic.
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
include/linux/udp.h | 1 +
include/uapi/linux/udp.h | 1 +
net/ipv4/udp.c | 13 ++++++++++---
net/ipv6/udp.c | 5 +++--
4 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 895240177f4f..8d281a0c0d9d 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -42,6 +42,7 @@ enum {
UDP_FLAGS_ENCAP_ENABLED, /* This socket enabled encap */
UDP_FLAGS_UDPLITE_SEND_CC, /* set via udplite setsockopt */
UDP_FLAGS_UDPLITE_RECV_CC, /* set via udplite setsockopt */
+ UDP_FLAGS_STOP_RCV, /* Stop receiving packets */
};
struct udp_sock {
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index edca3e430305..bb8e0a749a55 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -34,6 +34,7 @@ struct udphdr {
#define UDP_NO_CHECK6_RX 102 /* Disable accepting checksum for UDP6 */
#define UDP_SEGMENT 103 /* Set GSO segmentation size */
#define UDP_GRO 104 /* This socket can receive UDP GRO packets */
+#define UDP_STOP_RCV 105 /* This socket will not receive any packets */
/* UDP encapsulation types */
#define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* unused draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f9f5b92cf4b6..764d337ab1b3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -376,7 +376,8 @@ static int compute_score(struct sock *sk, const struct net *net,
if (!net_eq(sock_net(sk), net) ||
udp_sk(sk)->udp_port_hash != hnum ||
- ipv6_only_sock(sk))
+ ipv6_only_sock(sk) ||
+ udp_test_bit(STOP_RCV, sk))
return -1;
if (sk->sk_rcv_saddr != daddr)
@@ -494,7 +495,7 @@ static struct sock *udp4_lib_lookup2(const struct net *net,
result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, udp_ehashfn);
- if (!result) {
+ if (!result || udp_test_bit(STOP_RCV, result)) {
result = sk;
continue;
}
@@ -3031,7 +3032,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
set_xfrm_gro_udp_encap_rcv(up->encap_type, sk->sk_family, sk);
sockopt_release_sock(sk);
break;
-
+ case UDP_STOP_RCV:
+ udp_assign_bit(STOP_RCV, sk, valbool);
+ break;
/*
* UDP-Lite's partial checksum coverage (RFC 3828).
*/
@@ -3120,6 +3123,10 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname,
val = udp_test_bit(GRO_ENABLED, sk);
break;
+ case UDP_STOP_RCV:
+ val = udp_test_bit(STOP_RCV, sk);
+ break;
+
/* The following two cannot be changed on UDP sockets, the return is
* always 0 (which corresponds to the full checksum coverage of UDP). */
case UDPLITE_SEND_CSCOV:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7317f8e053f1..55896a78e94b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -137,7 +137,8 @@ static int compute_score(struct sock *sk, const struct net *net,
if (!net_eq(sock_net(sk), net) ||
udp_sk(sk)->udp_port_hash != hnum ||
- sk->sk_family != PF_INET6)
+ sk->sk_family != PF_INET6 ||
+ udp_test_bit(STOP_RCV, sk))
return -1;
if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr))
@@ -245,7 +246,7 @@ static struct sock *udp6_lib_lookup2(const struct net *net,
result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr),
saddr, sport, daddr, hnum, udp6_ehashfn);
- if (!result) {
+ if (!result || udp_test_bit(STOP_RCV, result)) {
result = sk;
continue;
}
--
2.47.1
next reply other threads:[~2025-05-01 3:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 3:51 Jiayuan Chen [this message]
2025-05-01 3:51 ` [RFC net-next v1 2/2] selftests/net: Add udp UDP_STOP_RCV selftest Jiayuan Chen
2025-05-01 4:42 ` [RFC net-next v1 1/2] udp: Introduce UDP_STOP_RCV option for UDP Kuniyuki Iwashima
2025-05-01 6:22 ` Jiayuan Chen
2025-05-01 7:12 ` Kuniyuki Iwashima
2025-05-01 14:27 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250501035116.69391-1-jiayuan.chen@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).