From: Jordan Rife <jrife@google.com>
To: netdev@vger.kernel.org
Cc: Jordan Rife <jrife@google.com>,
bpf@vger.kernel.org,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Eric Dumazet <edumazet@google.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Stanislav Fomichev <sdf@fomichev.me>,
Andrii Nakryiko <andrii@kernel.org>,
Yusuke Suzuki <yusuke.suzuki@isovalent.com>,
Jakub Kicinski <kuba@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>
Subject: [PATCH net-next v3 0/4] udp: Preserve UDP socket addresses on abort
Date: Mon, 30 Mar 2026 21:57:00 +0000 [thread overview]
Message-ID: <20260330215707.2374657-1-jrife@google.com> (raw)
BPF cgroup/sock_release hooks can be useful for performing cleanup,
map maintenance, or other bookkeeping when sockets are released. Cilium
uses cgroup/sock_release hooks to do just that, cleaning up maps keyed
by socket destination addresses. This works fine for TCP and connected
UDP sockets when they're closed gracefully, but Yusuke reported in [1]
that behavior is inconsistent following a socket abort.
From the perspective of a BPF sock_release hook, the state of sockets
following a socket abort (udp_abort, tcp_abort) is inconsistent and
surprising. After a sequence like the following,
1. connect(127.0.0.1:10001) or connect(::1:10001)
2. abort (diag_destroy() -> tcp_abort() or udp_abort())
3. close() -> sock_release hook runs
, the state of the sock_release program context differs depending on the
protocol and IP version.
+--------------------------------------------------------------+
| Configuration | ctx fields |
+------+----------+-----------+-----------+---------+----------+
| Case | Protocol | IP Family | dst_ip4 | dst_ip6 | dst_port |
+------+----------+-----------+-----------+---------+----------+
| 1 | TCP | IPv4 | 127.0.0.1 | - | 10001 |
| 2 | TCP | IPv6 | - | ::1 | 10001 |
| 3 | UDP | IPv4 | 0 | - | 0 |
| 4 | UDP | IPv6 | - | ::1 | 0 |
+------+----------+-----------+-----------+---------+----------+
For TCP, the state of dst_ip4/dst_ip6/dst_port are preserved. In the
case of UDP, both dst_ip4 and dst_port are cleared for IPv4 while
dst_ip6 remains intact for IPv6. This can be confusing for users of BPF
like Cilium where a sock_release hook makes use of these fields and
expects them to match a socket's state prior to abort. This series aims
to make the behavior consistent across the board to eliminate this
pitfall by preserving the state of dst_ip4 and dst_port after an abort.
[1]: https://github.com/cilium/cilium/issues/42649
v2: https://lore.kernel.org/netdev/20260303170106.129698-1-jrife@google.com/
CHANGES
=======
v2 -> v3:
* Expand selftests to add coverage for scenarios where a socket is bound
and connected.
* Simply unhashing a socket in udp_abort() as in v2 immediately releases
any ports the socket is bound to. Instead, the socket should hold onto
its port until it is closed just as before (Kuniyuki). v3 employs a
new strategy where inet_daddr and inet_dport are left intact in
__udp_disconnect during udp_abort and bound sockets remain in the
primary and portaddr hashes just as before. At the same time, the
logic in hash lookups is adjusted so that inet_daddr/inet_dport are
ignored unless the socket is currently connected (sk_state ==
TCP_ESTABLISHED).
Kuniyuki mentioned that performance may be a concern with changes to
logic in the fast path. I've tried to address these concerns by
testing the performance of udp4_lib_lookup2 before and after making
the changes.
v1 -> v2:
* Set connect_fd back to -1 after calling destroy() in the selftest
(Jakub).
Jordan Rife (4):
udp: Only compare daddr/dport when sk_state == TCP_ESTABLISHED
udp: Remove disconnected sockets from the 4-tuple hash
udp: Preserve destination address info after abort
selftests/bpf: Ensure dst addr/port are preserved after socket abort
net/ipv4/udp.c | 47 +++--
net/ipv6/udp.c | 18 +-
.../bpf/prog_tests/sock_destroy_release.c | 180 ++++++++++++++++++
.../bpf/progs/sock_destroy_release.c | 56 ++++++
4 files changed, 278 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy_release.c
create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_release.c
--
2.53.0.1118.gaef5881109-goog
next reply other threads:[~2026-03-30 21:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 21:57 Jordan Rife [this message]
2026-03-30 21:57 ` [PATCH net-next v3 1/4] udp: Only compare daddr/dport when sk_state == TCP_ESTABLISHED Jordan Rife
2026-03-31 1:21 ` Kuniyuki Iwashima
2026-04-01 20:50 ` Jordan Rife
2026-03-30 21:57 ` [PATCH net-next v3 2/4] udp: Remove disconnected sockets from the 4-tuple hash Jordan Rife
2026-03-31 16:51 ` kernel test robot
2026-03-31 17:33 ` kernel test robot
2026-03-31 17:42 ` kernel test robot
2026-03-31 17:55 ` kernel test robot
2026-03-31 18:49 ` kernel test robot
2026-03-30 21:57 ` [PATCH net-next v3 3/4] udp: Preserve destination address info after abort Jordan Rife
2026-03-30 21:57 ` [PATCH net-next v3 4/4] selftests/bpf: Ensure dst addr/port are preserved after socket abort Jordan Rife
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330215707.2374657-1-jrife@google.com \
--to=jrife@google.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@fomichev.me \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yusuke.suzuki@isovalent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox