linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap
@ 2025-11-05 11:36 Jiayuan Chen
  2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, Matthieu Baerts, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
	netdev, bpf, linux-kselftest

Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.

sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
implementing fast socket-level forwarding logic:
1. Users can obtain file descriptors through userspace socket()/accept()
   interfaces, then call BPF syscall to perform these replacements.
2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
   to replace handlers when TCP connections enter ESTABLISHED state
  (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

However, when combined with MPTCP, an issue arises: MPTCP creates subflow
sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
and may incorrectly replace their sk_prot. We need to reject such
operations. In patch 1, we set psock_update_sk_prot to NULL in the
subflow's custom sk_prot.

Additionally, if the server's listening socket has MPTCP enabled and the
client's TCP also uses MPTCP, we should allow the combination of subflow
and sockmap. This is because the latest Golang programs have enabled MPTCP
for listening sockets by default [2]. For programs already using sockmap,
upgrading Golang should not cause sockmap functionality to fail.

Patch 2 prevents the WARNING from occurring.


[1] truncated warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
 <TASK>
 do_accept+0xeb/0x190
 ? __x64_sys_pselect6+0x61/0x80
 ? _raw_spin_unlock+0x12/0x30
 ? alloc_fd+0x11e/0x190
 __sys_accept4+0x8c/0x100
 __x64_sys_accept+0x1f/0x30
 x64_sys_call+0x202f/0x20f0
 do_syscall_64+0x72/0x9a0
 ? switch_fpu_return+0x60/0xf0
 ? irqentry_exit_to_user_mode+0xdb/0x1e0
 ? irqentry_exit+0x3f/0x50
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
---[ end trace 0000000000000000 ]---

[2]: https://go-review.googlesource.com/c/go/+/607715

---
v3 -> v4: Addressed questions from Matthieu and Paolo, explained sockmap's
          operational mechanism, and finalized the changes
v2 -> v3: Adopted Jakub Sitnicki's suggestions - atomic retrieval of
          sk_family is required
v1 -> v2: Had initial discussion with Matthieu on sockmap and MPTCP
          technical details

v3: https://lore.kernel.org/bpf/20251023125450.105859-1-jiayuan.chen@linux.dev/
v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t

Jiayuan Chen (3):
  mptcp: disallow MPTCP subflows from sockmap
  net,mptcp: fix proto fallback detection with BPF
  selftests/bpf: Add mptcp test with sockmap

 net/mptcp/protocol.c                          |   6 +-
 net/mptcp/subflow.c                           |   8 +
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
 .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
 4 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c


base-commit: 89aec171d9d1ab168e43fcf9754b82e4c0aef9b9
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap
  2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
@ 2025-11-05 11:36 ` Jiayuan Chen
  2025-11-05 14:39   ` Matthieu Baerts
  2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, stable, Matthieu Baerts, Mat Martineau,
	Geliang Tang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
	netdev, bpf, linux-kselftest

The sockmap feature allows bpf syscall from userspace using , or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''
Consider two scenarios:

1. When the server has MPTCP enabled and the client also requests MPTCP,
   the sk passed to the BPF program is a subflow sk. Since subflows only
   handle partial data, replacing their sk_prot is meaningless and will
   cause traffic disruption.

2. When the server has MPTCP enabled but the client sends a TCP SYN
   without MPTCP, subflow_syn_recv_sock() performs a fallback on the
   subflow, replacing the subflow sk's sk_prot with the native sk_prot.
   '''
   subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
   '''
   Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
   converts the subflow to plain TCP.

For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.

For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.

Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/mptcp/subflow.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 30961b3d1702..ddd0fc6fcf45 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -2144,6 +2144,10 @@ void __init mptcp_subflow_init(void)
 	tcp_prot_override = tcp_prot;
 	tcp_prot_override.release_cb = tcp_release_cb_override;
 	tcp_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+	/* Disable sockmap processing for subflows */
+	tcp_prot_override.psock_update_sk_prot = NULL;
+#endif
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 	/* In struct mptcp_subflow_request_sock, we assume the TCP request sock
@@ -2180,6 +2184,10 @@ void __init mptcp_subflow_init(void)
 	tcpv6_prot_override = tcpv6_prot;
 	tcpv6_prot_override.release_cb = tcp_release_cb_override;
 	tcpv6_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+	/* Disable sockmap processing for subflows */
+	tcpv6_prot_override.psock_update_sk_prot = NULL;
+#endif
 #endif
 
 	mptcp_diag_subflow_init(&subflow_ulp_ops);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF
  2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
@ 2025-11-05 11:36 ` Jiayuan Chen
  2025-11-05 14:40   ` Matthieu Baerts
  2025-11-05 11:36 ` [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
  2025-11-05 14:37 ` [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  3 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, stable, Jakub Sitnicki, Matthieu Baerts,
	Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  syn_recv_sock()/subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''

When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
  subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
'''

Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.

This fix uses the more generic sk_family for the comparison instead.

Additionally, this also prevents a WARNING from occurring:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
 <TASK>
 do_accept+0xeb/0x190
 ? __x64_sys_pselect6+0x61/0x80
 ? _raw_spin_unlock+0x12/0x30
 ? alloc_fd+0x11e/0x190
 __sys_accept4+0x8c/0x100
 __x64_sys_accept+0x1f/0x30
 x64_sys_call+0x202f/0x20f0
 do_syscall_64+0x72/0x9a0
 ? switch_fpu_return+0x60/0xf0
 ? irqentry_exit_to_user_mode+0xdb/0x1e0
 ? irqentry_exit+0x3f/0x50
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
---[ end trace 0000000000000000 ]---

result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
(net-next/net/mptcp/protocol.c:4005)
Modules linked in:
...

PKRU: 55555554
Call Trace:
<TASK>
do_accept (net-next/net/socket.c:1989)
__sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
__x64_sys_accept (net-next/net/socket.c:2067)
x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
net-next/arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f87ac92b83d

---[ end trace 0000000000000000 ]---

Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/mptcp/protocol.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4cd5df01446e..b5e5e130b158 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 
 static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
 {
+	unsigned short family = READ_ONCE(sk->sk_family);
+
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	if (sk->sk_prot == &tcpv6_prot)
+	if (family == AF_INET6)
 		return &inet6_stream_ops;
 #endif
-	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
+	WARN_ON_ONCE(family != AF_INET);
 	return &inet_stream_ops;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
  2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
@ 2025-11-05 11:36 ` Jiayuan Chen
  2025-11-05 14:40   ` Matthieu Baerts
  2025-11-05 14:37 ` [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  3 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, Matthieu Baerts, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
	netdev, bpf, linux-kselftest

Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
 .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
 2 files changed, 193 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index f8eb7f9d4fd2..56c556f603cc 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -6,11 +6,14 @@
 #include <netinet/in.h>
 #include <test_progs.h>
 #include <unistd.h>
+#include <error.h>
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
+#include "socket_helpers.h"
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
 #include "mptcp_subflow.skel.h"
+#include "mptcp_sockmap.skel.h"
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1	"10.0.1.1"
@@ -436,6 +439,151 @@ static void test_subflow(void)
 	close(cgroup_fd);
 }
 
+/* Test sockmap on MPTCP server handling non-mp-capable clients. */
+static void test_sockmap_with_mptcp_fallback(struct mptcp_sockmap *skel)
+{
+	int listen_fd = -1, client_fd1 = -1, client_fd2 = -1;
+	int server_fd1 = -1, server_fd2 = -1, sent, recvd;
+	char snd[9] = "123456789";
+	char rcv[10];
+
+	/* start server with MPTCP enabled */
+	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+	if (!ASSERT_OK_FD(listen_fd, "redirect:start_mptcp_server"))
+		return;
+
+	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+	skel->bss->sk_index = 0;
+	/* create client without MPTCP enabled */
+	client_fd1 = connect_to_fd_opts(listen_fd, NULL);
+	if (!ASSERT_OK_FD(client_fd1, "redirect:connect_to_fd"))
+		goto end;
+
+	server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
+	skel->bss->sk_index = 1;
+	client_fd2 = connect_to_fd_opts(listen_fd, NULL);
+	if (!ASSERT_OK_FD(client_fd2, "redirect:connect_to_fd"))
+		goto end;
+
+	server_fd2 = xaccept_nonblock(listen_fd, NULL, NULL);
+	/* test normal redirect behavior: data sent by client_fd1 can be
+	 * received by client_fd2
+	 */
+	skel->bss->redirect_idx = 1;
+	sent = xsend(client_fd1, snd, sizeof(snd), 0);
+	if (!ASSERT_EQ(sent, sizeof(snd), "redirect:xsend(client_fd1)"))
+		goto end;
+
+	/* try to recv more bytes to avoid truncation check */
+	recvd = recv_timeout(client_fd2, rcv, sizeof(rcv), MSG_DONTWAIT, 2);
+	if (!ASSERT_EQ(recvd, sizeof(snd), "redirect:recv(client_fd2)"))
+		goto end;
+
+end:
+	if (client_fd1 > 1)
+		close(client_fd1);
+	if (client_fd2 > 1)
+		close(client_fd2);
+	if (server_fd1 > 0)
+		close(server_fd1);
+	if (server_fd2 > 0)
+		close(server_fd2);
+	close(listen_fd);
+}
+
+/* Test sockmap rejection of MPTCP sockets - both server and client sides. */
+static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
+{
+	int client_fd1 = -1, client_fd2 = -1;
+	int listen_fd = -1, server_fd = -1;
+	int err, zero = 0;
+
+	/* start server with MPTCP enabled */
+	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+	if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))
+		return;
+
+	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+	skel->bss->sk_index = 0;
+	/* create client with MPTCP enabled */
+	client_fd1 = connect_to_fd(listen_fd, 0);
+	if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
+		goto end;
+
+	/* bpf_sock_map_update() called from sockops should reject MPTCP sk */
+	if (!ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
+		goto end;
+
+	/* set trace_port = -1 to stop sockops */
+	skel->bss->trace_port = -1;
+	client_fd2 = connect_to_fd(listen_fd, 0);
+	if (!ASSERT_OK_FD(client_fd2, "connect_to_fd client_fd2"))
+		goto end;
+
+	server_fd = xaccept_nonblock(listen_fd, NULL, NULL);
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
+				  &zero, &server_fd, BPF_NOEXIST);
+	if (!ASSERT_EQ(err, -EOPNOTSUPP, "server should be disallowed"))
+		goto end;
+
+	/* MPTCP client should also be disallowed */
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
+				  &zero, &client_fd1, BPF_NOEXIST);
+	if (!ASSERT_EQ(err, -EOPNOTSUPP, "client should be disallowed"))
+		goto end;
+end:
+	if (client_fd1 > 0)
+		close(client_fd1);
+	if (client_fd2 > 0)
+		close(client_fd2);
+	if (server_fd > 0)
+		close(server_fd);
+	close(listen_fd);
+}
+
+static void test_mptcp_sockmap(void)
+{
+	struct mptcp_sockmap *skel;
+	struct netns_obj *netns;
+	int cgroup_fd, err;
+
+	cgroup_fd = test__join_cgroup("/mptcp_sockmap");
+	if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: mptcp_sockmap"))
+		return;
+
+	skel = mptcp_sockmap__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_sockmap"))
+		goto close_cgroup;
+
+	skel->links.mptcp_sockmap_inject =
+		bpf_program__attach_cgroup(skel->progs.mptcp_sockmap_inject, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.mptcp_sockmap_inject, "attach sockmap"))
+		goto skel_destroy;
+
+	err = bpf_prog_attach(bpf_program__fd(skel->progs.mptcp_sockmap_redirect),
+			      bpf_map__fd(skel->maps.sock_map),
+			      BPF_SK_SKB_STREAM_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
+		goto skel_destroy;
+
+	netns = netns_new(NS_TEST, true);
+	if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_sockmap"))
+		goto skel_destroy;
+
+	if (endpoint_init("subflow") < 0)
+		goto close_netns;
+
+	test_sockmap_with_mptcp_fallback(skel);
+	test_sockmap_reject_mptcp(skel);
+
+close_netns:
+	netns_free(netns);
+skel_destroy:
+	mptcp_sockmap__destroy(skel);
+close_cgroup:
+	close(cgroup_fd);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -444,4 +592,6 @@ void test_mptcp(void)
 		test_mptcpify();
 	if (test__start_subtest("subflow"))
 		test_subflow();
+	if (test__start_subtest("sockmap"))
+		test_mptcp_sockmap();
 }
diff --git a/tools/testing/selftests/bpf/progs/mptcp_sockmap.c b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
new file mode 100644
index 000000000000..d4eef0cbadb9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+int sk_index;
+int redirect_idx;
+int trace_port;
+int helper_ret;
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+	__uint(max_entries, 100);
+} sock_map SEC(".maps");
+
+SEC("sockops")
+int mptcp_sockmap_inject(struct bpf_sock_ops *skops)
+{
+	struct bpf_sock *sk;
+
+	/* only accept specified connection */
+	if (skops->local_port != trace_port ||
+	    skops->op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
+		return 1;
+
+	sk = skops->sk;
+	if (!sk)
+		return 1;
+
+	/* update sk handler */
+	helper_ret = bpf_sock_map_update(skops, &sock_map, &sk_index, BPF_NOEXIST);
+
+	return 1;
+}
+
+SEC("sk_skb/stream_verdict")
+int mptcp_sockmap_redirect(struct __sk_buff *skb)
+{
+	/* redirect skb to the sk under sock_map[redirect_idx] */
+	return bpf_sk_redirect_map(skb, &sock_map, redirect_idx, 0);
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
                   ` (2 preceding siblings ...)
  2025-11-05 11:36 ` [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
@ 2025-11-05 14:37 ` Matthieu Baerts
  3 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:37 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

On 05/11/2025 12:36, Jiayuan Chen wrote:
> Overall, we encountered a warning [1] that can be triggered by running the
> selftest I provided.

Thank you for the v4!

> sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
> implementing fast socket-level forwarding logic:
> 1. Users can obtain file descriptors through userspace socket()/accept()
>    interfaces, then call BPF syscall to perform these replacements.
> 2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
>    to replace handlers when TCP connections enter ESTABLISHED state
>   (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)
> 
> However, when combined with MPTCP, an issue arises: MPTCP creates subflow
> sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
> and may incorrectly replace their sk_prot. We need to reject such
> operations. In patch 1, we set psock_update_sk_prot to NULL in the
> subflow's custom sk_prot.

This new version looks good to me. I have some small comments on patches
1 and 2 that can only be addressed if a v5 is needed I think.

I have some questions for the 3rd patch. It would be good if someone
else with more experience with the BPF selftests can also look at it.

> Additionally, if the server's listening socket has MPTCP enabled and the
> client's TCP also uses MPTCP, we should allow the combination of subflow
> and sockmap. This is because the latest Golang programs have enabled MPTCP
> for listening sockets by default [2]. For programs already using sockmap,
> upgrading Golang should not cause sockmap functionality to fail.

Note: even if these patches here are needed to avoid stream corruption
and other issues, in your specific case with sockmap, I think it would
be better to set this env var until MPTCP support is added to sockmap:

  GODEBUG=multipathtcp=0

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap
  2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
@ 2025-11-05 14:39   ` Matthieu Baerts
  0 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:39 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Mat Martineau, Geliang Tang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

Hi Jiayuan,

On 05/11/2025 12:36, Jiayuan Chen wrote:
> The sockmap feature allows bpf syscall from userspace using , or based

(is there a word missing before the ','?)

> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
>   subflow_syn_recv_sock()
>     tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
>       bpf_skops_established       <== sockops
>         bpf_sock_map_update(sk)   <== call bpf helper
>           tcp_bpf_update_proto()  <== update sk_prot
> '''
> Consider two scenarios:
> 
> 1. When the server has MPTCP enabled and the client also requests MPTCP,
>    the sk passed to the BPF program is a subflow sk. Since subflows only
>    handle partial data, replacing their sk_prot is meaningless and will
>    cause traffic disruption.
> 
> 2. When the server has MPTCP enabled but the client sends a TCP SYN
>    without MPTCP, subflow_syn_recv_sock() performs a fallback on the
>    subflow, replacing the subflow sk's sk_prot with the native sk_prot.
>    '''
>    subflow_ulp_fallback()
>     subflow_drop_ctx()
>       mptcp_subflow_ops_undo_override()
>    '''
>    Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
>    converts the subflow to plain TCP.
> 
> For the first case, we should prevent it from being combined with sockmap
> by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
> sockmap's own flow.
> 
> For the second case, since subflow_syn_recv_sock() has already restored
> sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.
> 
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")

It should not change anything for the backport, but for this patch, the
Fixes tag can be older I think, e.g.

Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing
connections")

If you need to send a v5, please use this one.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF
  2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
@ 2025-11-05 14:40   ` Matthieu Baerts
  0 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:40 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Jakub Sitnicki, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
	netdev, bpf, linux-kselftest

Hi Jiayuan,

On 05/11/2025 12:36, Jiayuan Chen wrote:

If you need to send a v5, please remove the 'net,' prefix from the
title. And maybe good to mention 'sockmap', e.g.

  mptcp: fix proto fallback detection with sockmap

> The sockmap feature allows bpf syscall from userspace, or based
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
>   syn_recv_sock()/subflow_syn_recv_sock()
>     tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
>       bpf_skops_established       <== sockops
>         bpf_sock_map_update(sk)   <== call bpf helper
>           tcp_bpf_update_proto()  <== update sk_prot
> '''
> 
> When the server has MPTCP enabled but the client sends a TCP SYN
> without MPTCP, subflow_syn_recv_sock() performs a fallback on the
> subflow, replacing the subflow sk's sk_prot with the native sk_prot.
> '''
> subflow_syn_recv_sock()
>   subflow_ulp_fallback()
>     subflow_drop_ctx()
>       mptcp_subflow_ops_undo_override()
> '''
> 
> Then, this subflow can be normally used by sockmap, which replaces the
> native sk_prot with sockmap's custom sk_prot. The issue occurs when the
> user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
> Here, it uses sk->sk_prot to compare with the native sk_prot, but this
> is incorrect when sockmap is used, as we may incorrectly set
> sk->sk_socket->ops.
> 
> This fix uses the more generic sk_family for the comparison instead.
> 
> Additionally, this also prevents a WARNING from occurring:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
> mptcp_stream_accept+0x34c/0x380
> Modules linked in:
> RIP: 0010:mptcp_stream_accept+0x34c/0x380
> RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  do_accept+0xeb/0x190
>  ? __x64_sys_pselect6+0x61/0x80
>  ? _raw_spin_unlock+0x12/0x30
>  ? alloc_fd+0x11e/0x190
>  __sys_accept4+0x8c/0x100
>  __x64_sys_accept+0x1f/0x30
>  x64_sys_call+0x202f/0x20f0
>  do_syscall_64+0x72/0x9a0
>  ? switch_fpu_return+0x60/0xf0
>  ? irqentry_exit_to_user_mode+0xdb/0x1e0
>  ? irqentry_exit+0x3f/0x50
>  ? clear_bhb_loop+0x50/0xa0
>  ? clear_bhb_loop+0x50/0xa0
>  ? clear_bhb_loop+0x50/0xa0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  </TASK>
> ---[ end trace 0000000000000000 ]---
> 
> result from ./scripts/decode_stacktrace.sh:
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
> (net-next/net/mptcp/protocol.c:4005)
> Modules linked in:
> ...
> 
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept (net-next/net/socket.c:1989)
> __sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
> __x64_sys_accept (net-next/net/socket.c:2067)
> x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
> do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
> net-next/arch/x86/entry/syscall_64.c:94)
> entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f87ac92b83d
> 
> ---[ end trace 0000000000000000 ]---

If you need to send a v5, please remove the non-decoded stacktrace, only
the decoded one is interesting. You can also remove the 'net-next/'
prefix in the paths. So only to keep 'net/mptcp/protocol.c:4005' for
example.

> 
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  net/mptcp/protocol.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4cd5df01446e..b5e5e130b158 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>  
>  static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>  {
> +	unsigned short family = READ_ONCE(sk->sk_family);
> +
>  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> -	if (sk->sk_prot == &tcpv6_prot)
> +	if (family == AF_INET6)
>  		return &inet6_stream_ops;
>  #endif
> -	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> +	WARN_ON_ONCE(family != AF_INET);

I wonder if it would be interesting to return NULL if the family is not
AF_INET{,6}. But I guess this will cause a crash quickly after, no?

If yes, probably better to continue returning &inet_stream_ops here.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

>  	return &inet_stream_ops;
>  }
>  

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-11-05 11:36 ` [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
@ 2025-11-05 14:40   ` Matthieu Baerts
  2025-11-05 16:12     ` Jiayuan Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:40 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

Thank you for this new test!

I'm not very familiar with the BPF selftests: it would be nice if
someone else can have a quick look.

On 05/11/2025 12:36, Jiayuan Chen wrote:
> Add test cases to verify that when MPTCP falls back to plain TCP sockets,
> they can properly work with sockmap.
> 
> Additionally, add test cases to ensure that sockmap correctly rejects
> MPTCP sockets as expected.
> 
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>  .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
>  .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
>  2 files changed, 193 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> index f8eb7f9d4fd2..56c556f603cc 100644
> --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> @@ -6,11 +6,14 @@
>  #include <netinet/in.h>
>  #include <test_progs.h>
>  #include <unistd.h>
> +#include <error.h>

Do you use this new include?

>  #include "cgroup_helpers.h"
>  #include "network_helpers.h"
> +#include "socket_helpers.h"
>  #include "mptcp_sock.skel.h"
>  #include "mptcpify.skel.h"
>  #include "mptcp_subflow.skel.h"
> +#include "mptcp_sockmap.skel.h"
>  
>  #define NS_TEST "mptcp_ns"
>  #define ADDR_1	"10.0.1.1"
> @@ -436,6 +439,151 @@ static void test_subflow(void)
>  	close(cgroup_fd);
>  }
>  
> +/* Test sockmap on MPTCP server handling non-mp-capable clients. */
> +static void test_sockmap_with_mptcp_fallback(struct mptcp_sockmap *skel)
> +{
> +	int listen_fd = -1, client_fd1 = -1, client_fd2 = -1;
> +	int server_fd1 = -1, server_fd2 = -1, sent, recvd;
> +	char snd[9] = "123456789";
> +	char rcv[10];
> +
> +	/* start server with MPTCP enabled */
> +	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
> +	if (!ASSERT_OK_FD(listen_fd, "redirect:start_mptcp_server"))
> +		return;
> +
> +	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
> +	skel->bss->sk_index = 0;
> +	/* create client without MPTCP enabled */
> +	client_fd1 = connect_to_fd_opts(listen_fd, NULL);
> +	if (!ASSERT_OK_FD(client_fd1, "redirect:connect_to_fd"))
> +		goto end;
> +
> +	server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
> +	skel->bss->sk_index = 1;
> +	client_fd2 = connect_to_fd_opts(listen_fd, NULL);
> +	if (!ASSERT_OK_FD(client_fd2, "redirect:connect_to_fd"))
> +		goto end;
> +
> +	server_fd2 = xaccept_nonblock(listen_fd, NULL, NULL);
> +	/* test normal redirect behavior: data sent by client_fd1 can be
> +	 * received by client_fd2
> +	 */
> +	skel->bss->redirect_idx = 1;
> +	sent = xsend(client_fd1, snd, sizeof(snd), 0);
> +	if (!ASSERT_EQ(sent, sizeof(snd), "redirect:xsend(client_fd1)"))
> +		goto end;
> +
> +	/* try to recv more bytes to avoid truncation check */
> +	recvd = recv_timeout(client_fd2, rcv, sizeof(rcv), MSG_DONTWAIT, 2);
> +	if (!ASSERT_EQ(recvd, sizeof(snd), "redirect:recv(client_fd2)"))
> +		goto end;
> +
> +end:
> +	if (client_fd1 > 1)
> +		close(client_fd1);
> +	if (client_fd2 > 1)
> +		close(client_fd2);
> +	if (server_fd1 > 0)
> +		close(server_fd1);
> +	if (server_fd2 > 0)
> +		close(server_fd2);

Why do you check if it is above 0 or 1? Should you not always check if
it is >= 0 for each fd?


> +	close(listen_fd);
> +}
> +
> +/* Test sockmap rejection of MPTCP sockets - both server and client sides. */
> +static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
> +{
> +	int client_fd1 = -1, client_fd2 = -1;
> +	int listen_fd = -1, server_fd = -1;
> +	int err, zero = 0;
> +
> +	/* start server with MPTCP enabled */
> +	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
> +	if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))

In test_sockmap_with_mptcp_fallback(), you prefixed each error with
'redirect:'. Should you also have a different prefix here? 'sockmap-fb:'
vs 'sockmap-mptcp:' eventually?

> +		return;
> +
> +	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
> +	skel->bss->sk_index = 0;
> +	/* create client with MPTCP enabled */
> +	client_fd1 = connect_to_fd(listen_fd, 0);
> +	if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
> +		goto end;
> +
> +	/* bpf_sock_map_update() called from sockops should reject MPTCP sk */
> +	if (!ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
> +		goto end;

So here, the client is connected, but sockmap doesn't operate on it,
right? So most likely, the connection is stalled until the userspace
realises that and takes an action?

> +	/* set trace_port = -1 to stop sockops */
> +	skel->bss->trace_port = -1;

What do you want to demonstrate from here? That without the sockmap
injection, there are no new entries added? Is it worth checking that here?

> +	client_fd2 = connect_to_fd(listen_fd, 0);
> +	if (!ASSERT_OK_FD(client_fd2, "connect_to_fd client_fd2"))
> +		goto end;
> +
> +	server_fd = xaccept_nonblock(listen_fd, NULL, NULL);
> +	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
> +				  &zero, &server_fd, BPF_NOEXIST);
> +	if (!ASSERT_EQ(err, -EOPNOTSUPP, "server should be disallowed"))
> +		goto end;
> +
> +	/* MPTCP client should also be disallowed */
> +	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
> +				  &zero, &client_fd1, BPF_NOEXIST);
> +	if (!ASSERT_EQ(err, -EOPNOTSUPP, "client should be disallowed"))
> +		goto end;
> +end:
> +	if (client_fd1 > 0)
> +		close(client_fd1);
> +	if (client_fd2 > 0)
> +		close(client_fd2);
> +	if (server_fd > 0)
> +		close(server_fd);

Same here: should it not be "*fd >= 0"?

> +	close(listen_fd);
> +}

(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-11-05 14:40   ` Matthieu Baerts
@ 2025-11-05 16:12     ` Jiayuan Chen
  2025-11-05 16:28       ` Matthieu Baerts
  0 siblings, 1 reply; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-05 16:12 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp
  Cc: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

November 5, 2025 at 22:40, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> Hi Jiayuan,
> 
> Thank you for this new test!
> 
> I'm not very familiar with the BPF selftests: it would be nice if
> someone else can have a quick look.

Thanks for the review. I've seen the feedback on the other patches(1/3, 2/3) and will fix them up.


> On 05/11/2025 12:36, Jiayuan Chen wrote:
> 
> > 
> > Add test cases to verify that when MPTCP falls back to plain TCP sockets,
> >  they can properly work with sockmap.
> >  
> >  Additionally, add test cases to ensure that sockmap correctly rejects
> >  MPTCP sockets as expected.
> >  
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >  ---
> >  .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++
> >  .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++
> >  2 files changed, 193 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> >  
> >  diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  index f8eb7f9d4fd2..56c556f603cc 100644
> >  --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  @@ -6,11 +6,14 @@
> >  #include <netinet/in.h>
> >  #include <test_progs.h>
> >  #include <unistd.h>
> >  +#include <error.h>
> > 
> Do you use this new include?

"EOPNOTSUPP" I used was defined in error.h.

> > 
> >  +
> >  +end:
> >  + if (client_fd1 > 1)
> >  + close(client_fd1);
> >  + if (client_fd2 > 1)
> >  + close(client_fd2);
> >  + if (server_fd1 > 0)
> >  + close(server_fd1);
> >  + if (server_fd2 > 0)
> >  + close(server_fd2);
> > 
> Why do you check if it is above 0 or 1? Should you not always check if
> it is >= 0 for each fd?

My bad, ">=0" is correct.

> > 
> > + close(listen_fd);
> >  +}
> >  +
> >  +/* Test sockmap rejection of MPTCP sockets - both server and client sides. */
> >  +static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
> >  +{
> >  + int client_fd1 = -1, client_fd2 = -1;
> >  + int listen_fd = -1, server_fd = -1;
> >  + int err, zero = 0;
> >  +
> >  + /* start server with MPTCP enabled */
> >  + listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
> >  + if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))
> > 
> In test_sockmap_with_mptcp_fallback(), you prefixed each error with
> 'redirect:'. Should you also have a different prefix here? 'sockmap-fb:'
> vs 'sockmap-mptcp:' eventually?

I will do it.

> > 
> > + return;
> >  +
> >  + skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
> >  + skel->bss->sk_index = 0;
> >  + /* create client with MPTCP enabled */
> >  + client_fd1 = connect_to_fd(listen_fd, 0);
> >  + if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
> >  + goto end;
> >  +
> >  + /* bpf_sock_map_update() called from sockops should reject MPTCP sk */
> >  + if (!ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
> >  + goto end;
> > 
> So here, the client is connected, but sockmap doesn't operate on it,
> right? So most likely, the connection is stalled until the userspace
> realises that and takes an action?
>

It depends. Sockmap usually runs as a bypass. The user app (like Nginx)
has its own native forwarding logic, and sockmap just kicks in to accelerate
it. So in known cases, turning off sockmap falls back to the native logic.
But if there's no native logic, the connection just stalls.


> > 
> > + /* set trace_port = -1 to stop sockops */
> >  + skel->bss->trace_port = -1;
> > 
> What do you want to demonstrate from here? That without the sockmap
> injection, there are no new entries added? Is it worth checking that here?

That's redundant. I'll drop it.


[...]
> >  + if (client_fd1 > 0)
> >  + close(client_fd1);
> >  + if (client_fd2 > 0)
> >  + close(client_fd2);
> >  + if (server_fd > 0)
> >  + close(server_fd);
> > 
> Same here: should it not be "*fd >= 0"?

I will fix it.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-11-05 16:12     ` Jiayuan Chen
@ 2025-11-05 16:28       ` Matthieu Baerts
  2025-11-06  1:46         ` Jiayuan Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Matthieu Baerts @ 2025-11-05 16:28 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

On 05/11/2025 17:12, Jiayuan Chen wrote:
> November 5, 2025 at 22:40, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:
> 
> 
>>
>> Hi Jiayuan,
>>
>> Thank you for this new test!
>>
>> I'm not very familiar with the BPF selftests: it would be nice if
>> someone else can have a quick look.
> 
> Thanks for the review. I've seen the feedback on the other patches(1/3, 2/3) and will fix them up.

Thanks!

>> On 05/11/2025 12:36, Jiayuan Chen wrote:
>>
>>>
>>> Add test cases to verify that when MPTCP falls back to plain TCP sockets,
>>>  they can properly work with sockmap.
>>>  
>>>  Additionally, add test cases to ensure that sockmap correctly rejects
>>>  MPTCP sockets as expected.
>>>  
>>>  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
>>>  ---
>>>  .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++
>>>  .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++
>>>  2 files changed, 193 insertions(+)
>>>  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
>>>  
>>>  diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
>>>  index f8eb7f9d4fd2..56c556f603cc 100644
>>>  --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
>>>  +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
>>>  @@ -6,11 +6,14 @@
>>>  #include <netinet/in.h>
>>>  #include <test_progs.h>
>>>  #include <unistd.h>
>>>  +#include <error.h>
>>>
>> Do you use this new include?
> 
> "EOPNOTSUPP" I used was defined in error.h.

Ah OK. I usually only include 'error.h' to use 'error()'.
Is it not 'errno.h' (or 'linux/errno.h') you want instead?

I'm just surprised it is not already included but another one above. But
OK if it is not.

(...)

>>> + return;
>>>  +
>>>  + skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
>>>  + skel->bss->sk_index = 0;
>>>  + /* create client with MPTCP enabled */
>>>  + client_fd1 = connect_to_fd(listen_fd, 0);
>>>  + if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
>>>  + goto end;
>>>  +
>>>  + /* bpf_sock_map_update() called from sockops should reject MPTCP sk */
>>>  + if (!ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
>>>  + goto end;
>>>
>> So here, the client is connected, but sockmap doesn't operate on it,
>> right? So most likely, the connection is stalled until the userspace
>> realises that and takes an action?
>>
> 
> It depends. Sockmap usually runs as a bypass. The user app (like Nginx)
> has its own native forwarding logic, and sockmap just kicks in to accelerate
> it. So in known cases, turning off sockmap falls back to the native logic.
> But if there's no native logic, the connection just stalls.

Good to know, thanks!

So MPTCP request might still be handled by the "native logic" if any?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-11-05 16:28       ` Matthieu Baerts
@ 2025-11-06  1:46         ` Jiayuan Chen
  0 siblings, 0 replies; 11+ messages in thread
From: Jiayuan Chen @ 2025-11-06  1:46 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp
  Cc: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

November 6, 2025 at 24:28, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> On 05/11/2025 17:12, Jiayuan Chen wrote:
> 
> > 
> > November 5, 2025 at 22:40, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:
> >  
> >  
> > 
> > > 
> > > Hi Jiayuan,
> > > 
> > >  Thank you for this new test!
> > > 
> > >  I'm not very familiar with the BPF selftests: it would be nice if
> > >  someone else can have a quick look.
> > > 
> >  
> >  Thanks for the review. I've seen the feedback on the other patches(1/3, 2/3) and will fix them up.
> > 
> Thanks!
> 
> > 
> > > 
> > > On 05/11/2025 12:36, Jiayuan Chen wrote:
> > > 
> >  Add test cases to verify that when MPTCP falls back to plain TCP sockets,
> >  they can properly work with sockmap.
> >  
> >  Additionally, add test cases to ensure that sockmap correctly rejects
> >  MPTCP sockets as expected.
> >  
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >  ---
> >  .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++
> >  .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++
> >  2 files changed, 193 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> >  
> >  diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  index f8eb7f9d4fd2..56c556f603cc 100644
> >  --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> >  @@ -6,11 +6,14 @@
> >  #include <netinet/in.h>
> >  #include <test_progs.h>
> >  #include <unistd.h>
> >  +#include <error.h>
> > 
> > > 
> > > Do you use this new include?
> > > 
> >  
> >  "EOPNOTSUPP" I used was defined in error.h.
> > 
> Ah OK. I usually only include 'error.h' to use 'error()'.
> Is it not 'errno.h' (or 'linux/errno.h') you want instead?
> 
> I'm just surprised it is not already included but another one above. But
> OK if it is not.


Okay, I'll look into it and see if I can get rid of the error.h header.

> > 
> > > 
> > > So here, the client is connected, but sockmap doesn't operate on it,
> > >  right? So most likely, the connection is stalled until the userspace
> > >  realises that and takes an action?
> > > 
> >  
> >  It depends. Sockmap usually runs as a bypass. The user app (like Nginx)
> >  has its own native forwarding logic, and sockmap just kicks in to accelerate
> >  it. So in known cases, turning off sockmap falls back to the native logic.
> >  But if there's no native logic, the connection just stalls.
> > 
> Good to know, thanks!
> 
> So MPTCP request might still be handled by the "native logic" if any?
> 

Yes. If native logic exists, simply blocking the mixing of MPTCP and sockmap
should mostly keep the user's app working.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-06  1:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
2025-11-05 14:39   ` Matthieu Baerts
2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
2025-11-05 14:40   ` Matthieu Baerts
2025-11-05 11:36 ` [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2025-11-05 14:40   ` Matthieu Baerts
2025-11-05 16:12     ` Jiayuan Chen
2025-11-05 16:28       ` Matthieu Baerts
2025-11-06  1:46         ` Jiayuan Chen
2025-11-05 14:37 ` [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).